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Introduction 


These  proceedings  are  unusual  in  that  they  consist  of  papers  contributed 
to  a  conference  that  did  not  take  place.  The  Association  of  College 
&  Research  Libraries,  which  was  later  than  usual  in  announcing  its 
own  conference,  coinciden tally  scheduled  its  meeting  on  the  dates  we 
had  chosen  for  the  1989  Clinic.  Although  quite  a  few  people  registered, 
there  were  not  enough  to  make  the  Clinic  financially  viable,  and  the 
event  was  cancelled. 

Nevertheless,  because  of  the  encouragement  we  received  from 
numerous  individuals,  the  hard  work  of  the  authors,  and  the  importance 
of  the  topic,  we  decided  to  publish  the  contributed  papers  as  though 
the  Clinic  had  actually  taken  place.  Serials  catalogers  should  rejoice! 

The  sequence  of  the  papers  is  the  same  as  that  planned  for  the 
original  presentations,  which  were  arranged  by  size  of  computer: 
mainframes  first,  then  minicomputers,  and  finally  microcomputers. 
Although  mentioned  in  passing,  supercomputers  were  not  included  in 
the  planning  for  this  particular  conference. 

CHARLES  H.  DAVIS 

Editor 
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How  We  Got  Where  We  Are: 
A  Brief  Chronology 


The  creation  of  machine-readable  databases  and  computer-based  services 
has  always  been  predicated  on  the  availability  of  appropriate  hardware 
and  software.  During  the  first  generation  (roughly  1949  to  the  late  1950s), 
very  little  happened  because  the  machines  were  slow,  had  relatively 
little  storage  capacity,  and  were  extremely  expensive.  In  addition,  most 
programming  was  done  at  the  machine  level  —  a  tedious  process.  Second 
generation  computers  (from  the  late  1950s  through  the  early  1960s)  used 
transistors  instead  of  vacuum  tubes,  which  meant  they  were  faster,  more 
reliable,  held  more  data,  and  could  be  afforded  by  institutions  smaller 
than  the  federal  government. 

To  facilitate  programming,  a  number  of  higher-level  languages  were 
developed  during  the  early  years.  FORTRAN  was  designed  primarily 
for  scientific  and  engineering  applications;  ALGOL,  the  first  of  the 
so-called  procedure-oriented  languages,  provided  an  internationally 
recognized  structure  for  program  documentation;  LISP  eventually 
proved  valuable  in  studying  artificial  intelligence;  and  COMIT,  the  first 
language  designed  specifically  for  text  processing,  was  used  in 
computational  linguistics  and  early  studies  in  information  retrieval. 

Higher-level  languages  greatly  facilitated  software  development, 
because  programs  using  them  were  shorter,  easier  to  understand,  and 
could  be  used  on  a  variety  of  computers,  unlike  programs  written  at 
the  machine  level.  It  was  also  during  this  period,  in  1958,  that  Hans 
Peter  Luhn  of  IBM  described  the  mechanized  production  of  keyword 
indexes  as  well  as  an  automated  current-awareness  service  called  SDI 
(Selective  Dissemination  of  Information). 

The  language  COBOL  was  introduced  in  1960  and  had  special 
importance  for  libraries.  Unlike  earlier  languages,  it  facilitated  the 
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handling  of  large  alphanumeric  records  and  files.  It  was  the  first 
language  well-suited  for  use  with  MARC  records,  which  were  developed 
by  the  Library  of  Congress  during  the  mid-  to  late  1960s.  Magnetic 
tapes  of  these  records  were  distributed  to  selected  libraries  so  that  the 
use  of  cataloging  information  in  this  format  could  be  studied.  At  this 
time,  considerable  work  was  done  on  the  automation  of  library  technical 
services  such  as  acquisitions,  circulation,  and  cataloging.  Another 
programming  language  also  appeared  in  the  early  1960s:  SNOBOL, 
which  might  best  be  described  as  a  successor  to  COMIT;  it  was  very 
popular  for  text-processing  applications  but  saw  only  limited  use  in 
the  automation  of  library  technical  services. 

The  early  to  mid-1960s  saw  the  transition,  especially  in  scientific 
information  handling,  from  labor-intensive,  error-prone  tasks  to 
automated  processing — often  to  expedite  the  efficient  production  of 
printed  products  (e.g.,  Index  Medicus,  Chemical  Abstracts).  Keyword 
indexing,  SDI,  and  other  batch-mode  processes  became  popular.  This 
period  also  ushered  in  the  third  generation  of  computers,  which  featured 
integrated  circuitry,  greater  emphasis  on  direct-access  storage  (especially 
magnetic  disks),  and  improved  facilities  for  telecommunication. 

New  programming  languages  included  PL/I  and  BASIC.  PL/I 
incorporated  the  numerical  capabilities  of  FORTRAN,  the  file-handling 
of  COBOL,  and  the  most  crucial  text-processing  features  of  COMIT 
and  SNOBOL— all  in  a  structure  that  looked  like  ALGOL.  PL/I  has 
been  used  extensively  in  library  automation;  BASIC  was  originally 
designed  to  help  students  learn  programming  while  online  to  mainframe 
computers. 

Who  was  creating  databases?  Institutions  and  agencies  of  the  federal 
government,  e.g.,  the  Library  of  Congress  (LC),  the  National  Library 
of  Medicine  (NLM),  the  National  Aeronautics  and  Space  Administration 
(NASA),  the  Atomic  Energy  Commission  (AEC),  and  the  Commerce 
Department;  large  professional  societies,  e.g.,  the  American  Chemical 
Society  (through  the  Chemical  Abstracts  Service)  and  the  American 
Psychological  Association;  some  large  universities  through  grants,  e.g., 
SPIRES  and  BALLOTS  at  Stanford  University;  and  private  enterprise, 
e.g.,  the  Institute  for  Scientific  Information  (ISI)  with  Index  Chemicus 
and  Science  Citation  Index. 

The  late  1960s  saw  the  start  of  large  bibliographic  utilities  such 
as  OCLC,  RLIN  (originally  BALLOTS),  WLN,  UTLAS— all  of  which 
required  third  generation  hardware  and  software  as  well  as  substantial 
improvements  in  telecommunications  technology.  Probably  the  most 
popular  and  certainly  the  largest  venture  of  its  kind,  OCLC  was  not 
regarded  initially  as  the  source  of  a  database  for  online  searching,  but 
rather  as  a  means  of  producing  and  distributing  catalog  cards  for 
individual  libraries. 
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The  early  1970s  were  distinguished  by  the  advent  of  minicomputers 
and  the  growth  of  online  search  services  designed  for  efficient  searching 
via  telecommunications  (e.g.,  System  Development  Corporation's 
ORBIT;  Lockheed's  DIALOG;  the  New  York  Times  Information  Bank; 
and  later,  BRS  [Bibliographic  Retrieval  Service]).  The  online  search 
services  featured  Boolean  search  logic  for  interrogating  large  files. 
Minicomputers  permitted  the  development  of  local  and  regional  online 
circulation  systems,  usually  with  truncated  bibliographic  records.  In 
addition,  the  fourth  generation  of  computers  appeared,  characterized 
by  even  higher  processing  speeds  and  greater  storage  capacity.  Another 
new  programming  language,  Pascal,  was  also  introduced.  Named  for 
the  French  philosopher  and  mathematician,  it  is  similar  to  PL/I  but 
is  more  streamlined  and  has  superior  implementations  that  are  now 
available  for  all  sizes  of  computer. 

The  late  1970s  brought  Altair,  Radio  Shack,  and  Apple  microcom- 
puters— toys  at  first,  with  libraries  purchasing  a  few,  mostly  for 
entertainment  and  as  an  inducement  to  use  other  library  services.  Because 
it  was  a  simple  language  with  few  hardware  requirements,  BASIC  became 
the  most  popular  programming  language  for  these  smaller  machines. 

During  the  1980s,  microcomputers  grew  from  8-  to  16-  and  even 
32-bit  word  machines,  meaning  that  they  quickly  assumed  the  power 
formerly  associated  only  with  minis  and  mainframes.  The  internal 
"clock"  speed  of  these  computers  has  also  increased  dramatically,  from 
4.7  to  33  MHz  and  even  higher.  In  the  early  1980s,  IBM  entered  the 
microcomputer  business  in  a  big  way  with  its  PC  (Personal  Computer) 
using  Microsoft  Corporation's  operating  system,  MS-DOS.  Although 
not  "state-of-the-art,"  the  IBM-PC  became  an  industrial  standard 
because  of  IBM's  enormous  marketing  capabilities.  In  1984,  Apple 
introduced  the  first  of  its  Macintosh  microcomputers,  which,  although 
not  compatible  with  the  IBM  machines,  offered  different  capabilities 
including  graphic  user  interfaces  and  a  "mouse"  for  quick  placement 
of  the  cursor  on  the  computer's  monitor.  Software  manufacturers, 
inspired  initially  by  the  Macintosh  series,  have  begun  to  explore  options 
using  graphic  user  interfaces  as  alternatives  to  the  traditional  command- 
and  menu-driven  systems.  Most  of  the  newer  interfaces  and  software 
packages  are  designed  for  the  larger  and  faster  microcomputers  made 
available  only  recently.  Also  of  considerable  importance  has  been  the 
introduction  of  CD-ROM  (Compact  Disc-Read  Only  Memory).  It 
permits  the  storage  of  about  550  million  characters  on  one  5  1/4  inch 
disc,  can  store  graphics  as  well  as  text,  cannot  be  disrupted  by  magnetic 
fields  (and  therefore  has  archival  potential),  has  made  significantly  more 
data  available  to  individual  users,  and  may  replace  magnetic  tape  as 
the  distribution  medium  of  choice. 
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Individuals,  businesses,  and  libraries  have  purchased  microcompu- 
ters for  such  operations  as  word  processing,  spreadsheet  analysis, 
database  management,  information  retrieval,  and  electronic  mail. 
Widely  available  off-the-shelf  software  packages  (too  numerous  to 
mention)  perform  these  and  other  tasks.  Optical  storage  technology, 
especially  CD-ROM,  has  been  linked  with  micros,  creating  local 
workstations  for  bibliographic  database  searching  without  the  cost  of 
telecommunications.  Hardware  and  software  developments  make 
possible  local  and  regional  online  catalogs  of  full  bibliographic  (MARC) 
records. 

While  there  is  talk  of  a  fifth  generation  of  computers,  it  is  generally 
considered  to  be  in  progress,  and  the  differences  between  generations 
have  become  more  subtle.  Newer  languages  include  Ada,  which  is  similar 
to  Pascal;  PROLOG,  which  is  used  primarily  for  work  in  artificial 
intelligence  and  expert  systems;  and  both  Microsoft  and  Turbo  Pascal, 
which  are  microcomputer-based  supersets  of  standard  Pascal  that  contain 
numerous  useful  string  manipulation  functions.  The  newest  versions 
of  Pascal  also  feature  object-oriented  programming,  which  deliberately 
blurs  the  distinctions  between  programs  and  their  data;  it  is  meant  to 
go  beyond  conventional  procedures  to  simplify  computer  programming 
and  make  it  accessible  to  a  wider  audience.  Intended  primarily  for 
programming  professionals,  the  language  C  and  its  object-oriented 
extension  C++  occupy  a  level  somewhere  between  machine-  and  higher- 
level  programming. 

Current  areas  of  interest  include,  but  are  certainly  not  limited  to: 
studies  of  the  concept  of  user  friendliness,  experiments  with  expert 
systems,  the  use  of  microcomputers  as  intelligent  terminals,  system 
interface  design,  a  reexamination  of  the  roles  of  batch-mode  and  online 
services,  studies  of  the  dichotomy  between  end  users  and  intermediaries, 
large  databases,  full-text  systems,  and  the  library  as  an  access  point 
for  community  databases. 
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Libraries  and  Mainframe  Computers,  or 
When  Do  You  Need  a  747? 


INTRODUCTION 

Historical  Notes  on  Databases 

In  consideration  of  the  long-standing  title  of  these  meetings  as  "clinics 
on  library  applications  of  data  processing,"  we  should  remind  ourselves 
that  data  processing  is  a  means  of  improving  the  work  of  libraries  as 
information-handling  systems.  Information  has  been  defined  as  "data 
placed  in  context"  (Loomis,  1987,  p.  3)  with  the  database  as  one  part 
of  the  context,  and  the  library  another.  We  are  also  concerned  with 
data  from  the  system's  viewpoint,  noting  that  one  goal  of  database 
management  has  been  to  "create  more  independence  of  the  data  from 
the  programs  that  access  them"  (Lucas,  1986,  p.  220). 

These  quotations  highlight  important  aspects  of  how  databases  and 
their  associated  software  have  evolved,  and  how  they  are  viewed  by 
current  developers  and  knowledgeable  users.  Data  are  an  essential 
component  of  information,  and  hence  of  information  systems,  including 
libraries.  Because  of  their  enormous  processing  power  compared  with 
manual  filing  and  retrieval  systems,  computers  can  be  used  to  create 
a  revolution  in  library  services.  It  is  therefore  incumbent  on  librarians 
and  information  specialists  to  understand  and  make  the  best  possible 
use  of  computer  power  in  information  handling. 

Early  computer  systems  used  files  of  data,  but  did  not  treat  these 
files  as  a  coherent  whole,  or  database.  One  advantage  of  a  database 
system  is  the  ability  to  make  a  collection  of  data  available  to  many 
users,  not  unlike  the  goal  of  a  library  to  provide  a  pool  of  resources 
for  a  variety  of  patrons.  IMS,  the  first  database  system  developed  by 
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IBM  and  North  American  Aviation  in  the  1960s  (Loomis,  1987,  p.  177), 
represents  a  hierarchical  database  structure,  in  which  a  record  for  a 
specified  type  of  information  can  have  dependent  records.  For  example, 
a  BOOK  record  might  have  COPY  records  providing  details  on  each 
copy  of  a  book  in  the  library.  In  1971,  the  Data  Base  Task  Group  of 
the  Conference  on  Data  Systems  Languages  (CODASYL)  published  its 
description  of  the  network  data  model  (Loomis,  1987,  pp.  131-32).  This 
structure  emphasizes  one-to-many  relationships  (networks)  among 
records  of  different  types.  For  example,  an  ORDERS  record  could  be 
related  to  REQUESTORS  and  SUPPLIERS. 

The  third  database  model  is  the  relational  model  proposed  by  E. 
F.  Codd  which  found  commercial  applications  in  the  1980s  (Loomis, 
1987,  p.  78).  In  a  relational  system,  the  data  are  seen  as  flat  tables, 
with  all  information  on  an  item  presented  in  one  row  or  tuple.  Tables 
(also  called  relations)  can  be  joined,  for  example,  to  pull  together  an 
list  of  orders  placed  with  foreign  sources  from  an  ORDERS  table  and 
a  SUPPLIERS  table.  The  relational  approach  has  the  advantage  of 
allowing  any  question  to  be  asked  of  the  database — the  user  is  not  limited 
to  the  retrieval  approaches  anticipated  by  those  who  designed  the 
hierarchy  or  network.  However,  a  relational  system  requires  considerably 
more  computer  power,  especially  to  support  a  large  database. 

Traditionally,  data  have  been  considered  distinct  from  programs, 
and  common  wisdom  holds  that  the  more  distinct,  the  better.  This 
attitude  has  allowed  development  to  proceed  on  record  format  (e.g., 
the  MARC  record),  record  content  (e.g.,  cataloging  rules),  and  user 
interfaces  (e.g.,  online  catalogs)  without  requiring  that  those  involved 
understand  in  detail  the  procedures  that  will  be  used  to  store  or  retrieve 
information  in  the  database.  This  division  of  responsibility  has  been 
helpful,  but  may  have  created  some  artificial  distinctions  between  data 
and  programs.  Research  in  information  science  has  revealed  clues  as 
to  how  people  use  information;  for  example,  Richard  Trueswell's  (1965) 
finding  that  80  percent  of  the  questions  to  a  system  can  be  answered 
with  20  percent  of  the  system's  resources,  and  recent  studies  of  how 
cognitive  processes  affect  the  search  for  information  (Borgman,  1986). 
When  systems  designers  take  such  findings  into  account,  improvements 
in  both  human-computer  and  program-database  interfaces  can  be 
developed. 


LIBRARY  USE  OF  MAINFRAME  COMPUTERS 

Libraries  use  mainframe  computers  in  two  ways.  At  times,  libraries 
are  customers  or  clients  who  "reach  out  and  touch"  large  databases, 
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housed  on  large  computers  and  serving  a  large  number  of  customers 
(which  may  be  libraries,  other  institutions,  or  individuals).  Bibliogra- 
phic utilities  and  search  services  are  examples  of  this  kind  of  use. 

A  library  may  also  use  a  mainframe  which  supports  a  local 
automated  system  for  circulation  control,  online  catalog  access,  and 
other  applications.  In  this  case  the  computer  may  be  owned  by  the 
library,  its  parent  institution,  or  a  library  group.  The  computer  is  likely 
to  be  housed  in  the  same  city  or  state  as  the  library.  The  library  is 
likely  to  have  more  to  say  about  planning  for  a  local  system,  and  generally 
deals  with  a  systems  staff  and/or  computer  center  directly  responsible 
for  development  and  maintenance. 

In  June  1988,  a  special  section  in  Information  Technology  and 
Libraries  described  various  experiences  in  measuring  system  perfor- 
mance, including  capacity  modeling  at  RLG,  response  time  and 
performance  analysis  with  MELVYL,  and  measuring  system  perfor- 
mance with  Carlyle.  Describing  the  problems  of  performance  analysis 
and  improvement,  Clifford  A.  Lynch  (1988)  notes  that,  "in  most  real 
systems  performance  is  limited  by  a  small  number  of  bottlenecks  at 
any  given  time;  however,  when  one  is  eliminated,  a  new  one  will  limit 
system  performance"  (p.  178).  The  efforts  reported  in  this  special  section 
are  uncommon,  but  all  systems  need  detailed  information  on  current 
system  performance  in  projecting  future  needs.  Speaking  from  the 
library's  perspective,  Julie  Brown  (1988)  advises  other  libraries  to 
"consider  how  to  get  the  necessary  [performance]  information  in  another 
way"  (pp.  184-85). 

In  late  1988,  ten  providers  of  mainframe  services  to  libraries  were 
surveyed  on  the  hardware  and  software  that  support  their  database 
applications,  particularly  the  retrieval/access  aspects  librarians  use. 
Questions  addressed  the  types  of  computer(s)  and  associated  software, 
size  and  annual  growth  of  database(s),  and  number  of  users.  The 
institutions  surveyed  represent  the  variety  of  sources  through  which 
a  library  might  use  a  mainframe  computer.  Online  search  services 
providing  information  were  Chemical  Abstracts,  Dialog,  National 
Library  of  Medicine,  ORBIT,  and  Wilsonline.  Bibliographic  utilities 
were  OCLC,  RLIN,  and  WLN.  The  University  of  Illinois'  LCS/FBR 
and  NOTIS's  installation  at  the  Florida  Center  for  Library  Automation 
represented  online  catalog  and  circulation  systems.  The  results  of  this 
survey  are  given  in  Table  1. 

The  respondents  use  a  variety  of  equipment,  five  with  various  types 
of  IBM  mainframes,  two  with  National  Advanced  Systems  computers, 
two  with  Amdahl  machines,  one  with  Unisys  in  addition  to  IBM,  and 
one  with  Xerox  and  Tandem  equipment.  The  programs  for  ongoing 
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operations  are  written  in  one  or  more  of  the  following  languages: 
Assembly  language  (different  for  each  machine) — eight  systems;  PL/ 
I — eight  systems;  C — two  systems;  and  Pascal — two  systems. 

TABLE  1 
EXAMPLES  OF  MAINFRAME  COMPUTER  SYSTEMS  USED  BY  LIBRARIES 


Programming 
Computer  Languages 


Number   Millions     Database  Simultaneous 

of  Data-    of  Rec-        Growth  Users 

bases         ords  Rate 


CAS          IBM  3090  Assembler 
Registry   Unisys        C,  PL/I 
File 

Dialog      National    Assembler 
Advanced  PL/I, 
Systems      Pascal 
XL80, 
XL60, 
9080 

NLM        IBM  3081    Assembler, 
IBM  3084   PL/I 


1 


311  193.* 


26  11.6 


3% 


several 
hundred 


250 


NOTIS 

IBM  370 

Assembler 

1                5.4                2%               n.a. 

(Florida) 

systems 

(typical) 

OCLC 

Xerox 

Assembler, 

1             18                   40%               9200 

Sigma  9, 

C 

Tandem 

TXP 

ORBIT 

National 

PL/I 

100           100                    40%                n.a. 

Advanced 

Systems 

XL70 

RLIN 

Amdahl 

SPIRES, 

2            36                   14%               1170 

5890 

PL/I, 

Pascal 

Univ. 

IBM  3081 

Assembler, 

31             15                     6%               1000 

of 

PL/I 

Illinois 

WLN 

Amdahl 

Assembler, 

3           14.7                     5%                270  ter- 

470 

ADABAS, 

minals,  avg. 

PL/I 

110  users 

Wilson 

IBM  3081 

Assembler, 

22            4.6                  11%               30  external 

PL/I 

175  internal 

A  rough  measure  of  size  is  possible  from  the  number  of  records 
contained  in  each  system's  files.  The  largest  system,  Dialog,  reported 
311  databases,  containing  a  total  of  approximately  194  million  records. 
The  smallest  number  of  records  was  reported  by  Wilsonline,  with  22 
databases  and  4.6  million  records.  The  average  (mean)  number  of  records 
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in  all  ten  systems  is  40.7  million;  excluding  Dialog,  which  is  unusually 
large,  the  average  is  23.7  million  records  per  system.  There  is  a  great 
range  in  number  of  characters  per  record,  from  MARC  cataloging 
records,  to  bibliographic  records  including  abstracts,  to  full  text  of 
articles  in  some  files  supported  by  the  search  services.  In  addition, 
different  systems  support  differing  levels  of  detail  and  flexibility  in 
accessing  the  databases.  For  these  reasons  it  is  not  possible  to  use  number 
of  records  to  make  absolute  comparisons  of  size.  However,  the  number 
of  records  does  reflect  the  complexity  involved  in  database  navigation 
and  record  retrieval. 

The  systems  grow  at  widely  varying  rates,  with  NOTIS  reporting 
an  average  annual  growth  rate  of  2  percent  for  a  "typical  system"  to 
the  40  percent  growth  reported  by  Dialog.  The  bibliographic  utilities 
averaged  growth  of  1 1  percent  per  year,  while  the  search  services  averaged 
21  percent  annual  growth. 

The  number  of  simultaneous  users  each  system  supports  ranges 
from  205  for  Wilsonline  (of  which  thirty  can  be  outside,  i.e.,  non-Wilson 
searchers)  to  9200  for  OCLC.  It  is  interesting  to  note  that  the  highest 
number  of  simultaneous  users  of  a  search  service  was  "several  hundred" 
for  Dialog,  which  is  presumably  lower  than  the  1170  reported  by  RLIN, 
the  bibliographic  utility  supporting  the  fewest  simultaneous  users. 
Furthermore,  bibliographic  utilities  users  need  dynamic  access — they 
are  adding  or  changing  records  even  as  they  and  other  users  search 
and  retrieve  from  the  database. 

Advantages  of  Mainframe  Computers  for  Library  Applications 

Every  week  seems  to  bring  another  breakthrough  in  computer 
technology,  making  unit  sizes  smaller,  faster,  and  less  expensive 
(measuring  instructions  per  second  per  dollar).  Still,  many  library 
operations  rely  on  big  mainframe  computers.  Why?  The  obvious  reasons 
are  processing  speed  and  the  ability  to  handle  large  files  and  many 
transactions.  With  the  large  numbers  of  users  mentioned  above,  many 
doing  complex  search  operations,  it  is  critical  that  the  system  be  able 
to  receive,  process,  and  respond  to  commands  rapidly.  Simply  retrieving 
information  from  large  numbers  of  disk  drives  requires  a  powerful 
machine;  moreover,  mainframe  architecture  supports  many  independent 
"channels,"  or  disk  controllers,  providing  greater  throughput.  To  date, 
only  mainframe  computers  have  been  able  to  provide  the  speed  of 
execution  needed  to  support  many  users  with  real-time  access  to  large 
files.  Database  machines,  hardware  designed  to  optimize  database 
functions,  have  been  proposed  as  one  way  to  improve  system  performance 
(Salmon,  1984).  To  date,  library  systems  have  most  often  chosen  instead 
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to  stay  with  general-purpose  computers;  and  the  development  of 
computer  power  seems  to  more  than  keep  pace  with  the  design 
improvements  offered  in  the  database  machines. 

Some  more  subtle  considerations  also  speak  in  favor  of  mainframes 
for  library  applications.  While  the  size  and  complexity  of  databases 
is  impressive,  even  more  staggering  is  the  investment  of  storage  and 
processing  power  required  to  develop  and  maintain  the  indexes  that 
support  the  sophisticated  retrieval  capabilities  to  which  libraries  have 
become  accustomed.  After  years  of  experience  with  the  search  services, 
Boolean  logic  is  no  longer  enough.  Now  we  want  field  restrictions, 
string  searching,  proximity  operations,  and  other  refinements.  The 
problems  of  textual  databases,  especially  full-text  databases,  require  more 
discrimination  in  the  retrieval  process.  The  addition  of  artificial  near- 
intelligence  will  place  additional  demands  on  systems,  as  knowledge 
bases  and  search  heuristics  are  developed. 

As  databases  become  larger  and  more  complex,  and  as  non-specialists 
use  these  systems,  we  see  the  need  for  more  helpful  or  "interventionist" 
search  software.  Lynch  (1987)  has  described  problems  with  very  large 
bibliographic  databases,  where  the  traditional  search  keys  (e.g.,  subject 
headings)  produce  too  many  hits.  Advanced  retrieval  software  can  specify 
the  complex  search  statements  for  beyond-Boolean  searching  and  provide 
assistance  to  new  or  infrequent  users.  However,  this  sophisticated 
software  will  likely  require  more  computer  speed  and  power. 

As  systems  become  more  powerful  they  must  also  become  less  hostile 
(more  friendly)  and  easier  to  use.  John  Scully's  "knowledge  navigator" 
envisions  such  an  interface,  applying  computer  processing  power  to 
traverse  a  large  file  and  find  useful  information  (Apple,  1988).  It  would 
seem  logical  to  house  the  navigator's  capabilities  in  a  personal 
workstation  rather  than  a  central  mainframe,  but  mainframes  will  need 
to  be  consistent  enough  to  interact  with  a  variety  of  navigators  and 
powerful  enough  to  support  several  sophisticated  users  at  a  time. 

Another  reason  for  library  reliance  on  mainframes  may  be,  as  Dennis 
Reynolds  ( 1985)  has  observed,  that  "libraries  are  generally  adapters  rather 
than  innovators  of  technology"  (p.  159).  This  desire  to  use  proven 
hardware  and  software  has  been  a  prudent  response  to  rapid  changes 
in  technology,  notably  when  computer  generations  were  succeeding  each 
other  with  great  rapidity  in  the  1960s  and  1970s.  Given  libraries' 
responsibilities  to  preserve  information,  a  cautious  or  skeptical  approach 
to  new  developments  may  be  in  order.  However,  one  consequence  of 
this  caution  is  that  libraries  are  not  (or  are  not  often)  on  the  cutting 
edge  of  technical  developments,  but  rather  adapt  existing  technology 
for  library  applications.  Thus  breakthroughs  in  computer  hardware  or 
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database  management  are  likely  to  be  fairly  well-established  by  the  time 
they  are  adopted  by  libraries.  The  trick  is  to  adopt  a  breakthrough 
before  it  has  become  a  dead  end. 

Human  Issues  in  Use  of  Mainframes 

One  of  the  attractions  of  mini-  and  micro-computers  is  the  sense 
of  participation  and  autonomy  possible  for  the  computer's  users.  The 
ultimate  feeling  of  individual  control  is  embodied  in  the  notion  of  a 
"personal  computer."  In  fact,  one  handy  way  of  distinguishing  between 
micros,  minis,  and  mainframes  is  based  not  on  speed  or  power  but 
on  cost,  the  ultimate  determinant  of  ownership.  A  micro  can  fit  into 
a  personal  budget,  a  mini  into  a  departmental  budget,  and  a  mainframe 
into  an  institutional  budget.  When  a  library  uses  a  big,  sophisticated 
computer  system,  it  is  unlikely  to  own  the  computer,  and  in  fact  is 
seldom  the  only  user.  Rather  than  dealing  simply  with  one's  own  designs 
and  implementation  problems  (as  with  a  personal  computer)  or  with 
one  or  two  systems  librarians  (as  is  common  with  many  turnkey  systems), 
the  library  as  an  institution  may  be  working  with  (or  against)  a  large 
number  of  systems  staff  whose  allegiance  is  to  another  part  of  a  university, 
corporation,  city,  or  even  to  an  outside  vendor.  Regardless  of  the  good 
will  and  charitable  emotions  professed  at  the  start  of  such  a  venture, 
there  are  times  when  differences  of  opinion  are  unavoidable. 

The  following  observations  on  how  people  react  to  the  stresses 
of  library  automation  are  based  on  my  own  experiences  and  on 
discussions  with  colleagues  who  have  worked  as  systems  staff  members: 

"Us  vs.  Them" 

Library  staff  members  have  widely  different  expectations  of  an 
automated  system  and  different  degrees  of  willingness  to  make  sacrifices 
for  its  implementation  (Fine,  1985;  Shaw,  1986).  For  various  reasons, 
they  may  need  or  want  a  scapegoat  when  things  go  wrong.  Similarly, 
computer  operators,  trainers,  sales  representatives,  network  librarians, 
and  others  feel  frustrated  by  the  reluctance  of  library  people  to  "give 
the  system  a  break."  When  these  two  groups  work  in  different  buildings 
(or  even  in  different  states)  report  to  different  supervisors,  and  may 
even  have  different  overall  goals  for  their  work,  it  is  not  surprising 
to  see  differences  develop. 

The  Project  Mentality 

Library  automation  is  often  undertaken  as  a  special  project.  There 
are  many  milestones  to  full  implementation  of  the  system,  and  significant 
amounts  of  energy  are  expended  and  goodwill  due  bills  called  in. 
However,  there  is  generally  little  attention  to  the  ongoing  management 
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of  the  automation  system  and  the  demands  this  will  place  on  library 
staff  and  users.  The  initial  emphasis  on  implementation  is  dwarfed 
after  five  years'  ongoing  effort  and  costs. 

Limited  Reward  for  Preventing  A  Catastrophe 

Systems  people  spend  a  great  deal  of  time  doing  things  that  everyone 
hopes  the  users  will  not  see.  Developing,  debugging,  and  testing  new 
applications  is  demanding.  Since  each  large  bibliographic  system  is 
unique,  many  even  minor  changes  involve  developing  and  testing  new 
programs. 


CONCLUSION 

Libraries  need  mainframe  computers  and  powerful  database 
management  systems  to  put  data  in  context:  to  create  information.  The 
notion  of  a  "databank"  or  "database"  has  been  around  since  at  least 
the  1960s.  In  1973,  Charles  W.  Bachman  urged  a  revolution  in  database 
management,  from  a  computer-centered  to  a  database-centered 
viewpoint.  It  may  now  be  time  for  the  next  revolution,  from  database- 
centered  to  a  view  that  encompasses  the  information  system  of  which 
the  database  is  a  part.  This  integrated  information  outlook  must  stress 
the  seamless  integration  of  hardware,  software,  databases,  and 
intelligence  to  provide  the  information  each  user  needs. 

The  747  airplane  mentioned  in  the  title  was  chosen  as  a  symbol 
of  size  and  power.  As  we  consider  the  impressive  power  of  the  computers 
and  systems  available  to  libraries,  we  should  understand  the  importance 
of  the  environment  in  which  a  powerful  system  works.  Recent 
experiences  with  airliners  have  alerted  us  to  the  need  for  ongoing 
attention  to  the  human  and  mechanical  aspects  of  maintenance  and 
planning. 

The  demands  of  information  processing  have  brought  together 
technology  ranging  from  micros  to  supercomputers.  Library  automation 
is  generating  communication  among  people  who  previously  had  little 
in  common,  and  in  some  cases  is  reducing  the  differences  between  them. 
It  may  be  time  to  reconsider  distinctions,  such  as  that  between  data 
and  programs,  which  have  traditionally  been  considered  reasonable. 
It  is  not  clear  that  the  next  generation  of  systems  will  place  the  librarian 
in  the  pilot's  seat  of  a  Concorde  or  a  747,  but  it  is  time  to  realize  some 
of  the  great  expectations  librarians  have  long  had  for  automation. 
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Object-Oriented  Databases  for  Libraries 
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INTRODUCTION 

If  the  author  had  been  as  creative  as  Debora  Shaw  (these  proceedings), 
the  title  of  this  paper  could  have  been  "Libraries  and  Object-Oriented 
Database  Systems,  or  When  Do  You  Need  an  SST?"  However  clever 
the  title,  the  intent  of  the  following  discussion  is  to  examine  some 
of  the  latest  developments  in  database  technology  and  to  conjecture 
how  they  might  be  applied  to  information  processing  within  the  library 
world.  The  main  new  development  that  will  be  considered  is  an  object- 
oriented  database  system.  But  other  new  developments  will  be  addressed 
as  well. 

Database  systems  and  practice  have  developed  to  satisfy  an 
organization's  critical  needs  for  operational  data.  A  database 
management  system  (DBMS)  is  supposed  to  make  it  easy  to  share  and 
protect  vital  data  and  information.  The  designers  of  such  systems  are 
charged  to  get  all  the  right  information  into  the  system,  make  it  easy 
for  multiple  sub-organizations  to  get  at  it  and,  yet,  prevent  the  wrong 
eyes  from  seeing  the  parts  of  it  they  have  no  right  to.  DBMSs  are  further 
charged  to  make  sure  that  a  minimum  of  crucial  information  is  ever 
lost  by  accident  or  disaster,  or  is  destroyed  by  miscreants. 

Libraries  are  certainly  organizations  very  dependent  on  information. 
In  fact,  the  case  could  be  made  that  information  is  more  their  lifeblood 
than  it  is  for  airlines  or  banks.  Libraries  exist  to  provide  information 
to  patrons  in  the  form  of  books  and  other  media,  direct  services  of 
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reference  specialists  and,  more  recently,  electronic  delivery  of 
information.  However,  the  organization's  operational  information  in 
databases,  as  opposed  to  self-descriptive  data  about  the  database  itself, 
is  always  about  something  else.  It  is  about  a  seat  reserved  on  flight 
507  by  S.  Jones  on  June  23.  Or  it  is  about  Acme  Corporation's  loan 
that  is  due  on  the  25th  of  July.  Libraries  also  need  to  store  information 
about  their  activities.  Accessions,  cataloging,  acquisitions,  holdings,  and 
circulation  data  all  need  to  be  recorded  and  managed.  This  is  information 
that  runs  the  library  and  that  is  not  usually  completely  available  to 
patrons. 

So  in  the  case  of  libraries,  database  systems  perform  at  multiple 
levels  of  activity.  They  are  used  by  local  libraries  to  hold  and  process 
operational  data.  They  are  national  depositories  of  both  operational 
data,  as  in  OCLC  holding  bibliographic  data  required  for  cataloging, 
and  pure  information  as  in  the  Dialog  or  Nexis  services.  All  these 
observations  lead  to  the  conclusion  that  effective  databases  are  crucial 
to  a  library's  successful  daily  operation.  Therefore,  it  is  safe  to  assume 
that  many  systems  are  already  in  place  and  serving  their  libraries  well. 
Who  could  ask  for  anything  more?  Well,  most  systems  in  place  could 
be  improved  and  some  systems  are  only  barely  keeping  their  heads  above 
water.  So  what  are  the  technical  problems  facing  these  systems? 


PROBLEMS  OF  LIBRARY  DATA  PROCESSING 

Any  organization  doing  information  processing  encounters 
difficulties  from  time  to  time.  Often  they  are  products  of  outside  pressure 
on  the  organization  to  perform.  But  internal  circumstances  may  also 
dictate  the  level  of  data  processing  happiness.  The  three  severest  problems 
that  face  designers  of  data  processing  and  database  systems  applied  to 
the  library  world  are: 

1.  extremely  sophisticated  search  of  large  groups  of  sources  required; 

2.  high-volume  transaction  processing  (Lynch,  1987);  and 

3.  complex  entities  to  model  (Bancilon,  1988). 

Clearly,  not  every  application  faces  all  three  of  the  problems 
simultaneously.  Many  systems  often  are  under  stress  from  patron 
demands  alone.  And  of  course,  some  systems  are  still  composed  of  manual 
card  files  and  people.  However,  one  of  the  three  problems  is  faced  by 
almost  all  large  institutions.  Some  organizations  such  as  the 
bibliographic  utilities  may  confront  all  three  simultaneously.  Let  us 
look  at  each  problem  separately. 

Search  in  conventional  database  systems  is  fairly  benign  and  simple: 
To  find  all  the  payments  under  contract  number  CN82723,  turn  to  your 
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computer  terminal  and  request  a  report  keyed  on  the  contract  number. 
Shortly  on  your  screen  or  the  printer,  the  list  appears.  A  library  patron's 
requests  for  information  are  often  not  so  simple.  A  typical  search  might 
be  to  look  for  all  recent  books  and  articles  about  object  orientation. 
The  reference  librarian  cannot  just  turn  to  a  terminal  and  type  a  query. 
First  of  all,  there  is  no  single  system  that  covers  all  published  materials. 
So  expert  knowledge  is  applied  just  to  determine  which  bibliographic 
resources  to  consult.  But  once  a  set  of  them  is  chosen,  it  is  still  a  difficult 
task  to  form  good  search  strategies  for  each  source.  Does  one  search 
for  "object  orientation"  or  "object"  or  possibly  "object-oriented"? 
Probably,  the  reference  librarian  must  further  question  the  patron  for 
more  details  of  the  real  subject  desired,  which  turns  out  to  be  recently 
offered,  commercial,  object-oriented  databases  in  the  U.S.  Being  sure 
that  what  is  retrieved  is  exactly  what  is  sought,  and  that  the  final  search 
result  is  complete  is  most  difficult  and,  all  too  often,  most  expensive. 

Needing  to  complete  huge  numbers  of  transactions  per  day  is  a 
real  problem  for  library  service  organizations.  Shaw's  (these  proceedings) 
survey  of  library  database  applications  (in  these  proceedings)  shows 
a  reflection  of  this  problem.  She  tabulates  characteristics  of  ten  large 
systems  and  none  of  them  use  a  relational  DBMS.  Commercial  relational 
systems  provide  adequate  tools  to  meet  many  business  needs.  They, 
however,  do  not  provide  high  rates  of  transactions.  They  are  fairly  simple 
to  construct  and  use,  but  are  still  too  slow  for  high-volume  operation. 
Libraries  and  businesses  are  left  to  either  "roll  their  own"  or  adopt 
systems  using  older  technology  but  offering  a  large  transaction  rate. 
In  the  future,  the  newer  database  systems  will  mature  and  offer  their 
owners  better  help  with  numerous  transactions. 

Complexity  is  a  problem  in  and  out  of  the  library  world.  The  auto 
engineer  sitting  at  a  high-powered  workstation  would  like  to  keep  his 
car  design  ideas  well  organized.  A  car  has  hundreds  of  subassemblies 
and  thousands  of  individual  parts.  The  design  engineer  may  have  several 
versions  of  a  design  for  any  of  the  subassemblies  or  parts.  It  is  a  fairly 
difficult  task  to  get  a  database  system  to  store  all  the  versions  and  all 
the  data  that  describes  the  car  design.  Librarians  also  face  relatively 
complex  entities  that  must  be  described  in  databases.  The  MARC  format 
itself  is  complex  and,  thus,  the  storage  of  bibliographic  information 
is  no  simple  task,  especially  if  there  are  added  constraints  of  efficient 
storage  and  rapid  retrieval.  Conventional  database  systems  groan  at  being 
bent  to  achieve  the  modeling,  storage  and  manipulation  of  such  complex 
objects. 

Complex  entities  and  complex  relationships  are  simple  problems 
for  an  object-oriented  database  system  (OODBS).  It  is  specifically 
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designed  to  handle  complexity  more  easily  than  relational  systems  can. 
The  next  section  describes  the  nature  of  OODBSs  and  how  they  deal 
with  complexity. 

Object  Orientation 

To  understand  the  promise  of  the  new  technology  inherent  in  an 
OODBS,  a  few  basic  concepts  must  be  grasped  (Peterson,  1987;  Shriver 
8c  Wegner,  1988).  These  concepts  may  be  understood  by  answering  some 
initial  questions.  First,  what  are  objects,  anyway,  and  how  do  they 
behave?  Second,  how  can  objects  fit  into  a  database  system?  Third,  how 
does  one  model  one's  world  with  an  object-oriented  system?  And  last, 
what  are  the  implications  of  object  orientation  for  system  development 
and  maintenance? 

An  object  in  an  object-oriented  system  is  the  computer  reflection 
of  a  real  world  object  (see  Figure  1).  This  is  especially  true  of  database 
objects,  whose  job  is  to  model  a  thing  in  the  world  that  one  wishes 
to  keep  track  of.  Computer  objects,  like  their  counterparts,  persist  and 
have  a  unique  identity,  which  means  no  matter  how  much  they  look 
alike,  they  are  always  distinct.  Real  world  objects  are  like  this,  too: 
take  two  new  baseballs  off  the  shelf  at  a  sporting  goods  store  and  they 
are  distinct,  even  though  they  look  and  measure  the  same. 


Message  Interface 


Figure  1.  An  object 


An  early  commercial  OODBS  was  Vbase,  for  which  there  are  a 
whole  series  of  manuals  and  guides  (Ontologic,  Inc.,  1987).  In  addition, 
Hewlett-Packard  is  working  on  a  commercial  OODBS  called  Iris  DBMS 
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(Fishman  et  al.,  1987).  Dome  Software  offers  a  networked  system  of 
workstations  and  an  OODBS  server  running  on  a  DEC  VAX  called 
DOME  (Distributed  Object  Management  Environment)  but  previously 
called  LCE  (Kachhawaha  &  Hogan,  1987);  Servio  Logic  offers  an  OODBS 
called  GemStone  (Maier  et  al.,  1986);  and  both  Servio  Logic  and  Park 
Place  Systems,  an  offshoot  from  Xerox's  Palo  Alto  Research  Center, 
have  tried  to  build  an  OODBS  by  making  objects  created  while  running 
the  programming  language  SmallTalk  persistently  over  long  periods 
(Copeland  8c  Maier,  1984). 

A  computer  object  is  a  tool  for  encapsulation  of  data  and  operations. 
The  data  is  used  to  record  things  about  the  real  object,  while  the 
operations  may  tell  the  outside  world  about  the  data  or  manipulate 
it.  For  example,  let  us  go  back  to  the  design  engineer  keeping  track 
of  his  new  car.  One  object  that  might  be  in  the  OODBS  for  car  design 
is  a  door.  The  designer  could  store  information  about  materials  to  make 
the  door  and  its  physical  characteristics  such  as  measurements.  To  aid 
in  viewing  the  object,  the  system  might  well  contain  operations  to  display 
the  data  both  as  numbers  and  to  draw  the  door  on  a  bit-mapped  screen. 
So  an  operation  might  be  a  "read"  that  would  get  the  numbers  in  the 
data  section  of  the  object,  but  could  also  have  operations  like  "draw" 
or  "rotate"  as  well. 

Because  encapsulation  builds  a  wall  around  the  data,  a  user  can 
only  ask  questions  about  them  and  request  actions  be  taken  on  them 
through  the  operations.  A  computer  scientist  describes  what  we  have 
so  far  as  an  "abstract  data  type."  Object  orientation  actually  simplifies 
the  usual  concept  of  an  abstract  data  type.  In  an  object-oriented  system, 
the  interaction  with  objects  comes  through  messages  to  objects.  Messages 
allow  a  uniform  interface  to  all  objects,  because  each  message  has  the 
same  form  and  any  message  may  be  sent  to  any  object.  Of  course,  if 
the  object  cannot  understand  the  message,  it  will  just  reply  that  it  does 
not  know  the  operation  the  user  wanted.  A  normal  abstract  data  type 
has  a  more  complicated  interface  made  up  of  all  the  public  operations 
on  the  data  type,  which  is  not  regular.  To  communicate  one  must  know 
the  exact  rules  of  these  operations.  Objects  are  simpler,  because  a  message 
always  has  the  form: 

target-object  selector  {optional  parameters} 

Here  the  "target-object"  must  be  the  name  of  some  existing  object, 
"selector"  is  the  name  of  an  operation,  which  is  usually  called  a  method, 
and  the  "optional  parameters"  are  any  data  that  the  method  needs  to 
work  on.  For  example,  in  an  object-oriented  graphics  package,  a  typical 
message  might  be: 

triangle(T23) 

rotate-Right  105, 
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which  would  ask  the  triangle  named  "T23"  to  rotate  itself  to  the  right 
by  105  degrees.  We  could  also  send  the  message: 
user(Sally) 

rotateRight  47. 

Chances  are,  the  user  objects  are  in  the  system  just  to  identify  people 
who  can  log  on  and  an  instance  of  the  user  class  cannot  rotate.  This 
is  an  example  of  the  uniform  interface — even  though  a  user  cannot 
rotate  itself,  the  message  is  still  legal,  is  delivered,  but  gets  the  reply 
that  "rotateRight  is  not  a  selector  for  objects  of  the  class  user." 

Objects  group  together  to  form  classes.  A  class  is  both  a  container 
of  all  like  objects  and  a  way  to  make  a  definition  of  an  object.  In  fact, 
an  object  comes  into  existence  when  a  class  object  is  sent  the  message 
triangle  new.  This  asks  the  class  object  "triangle"  to  run  the  method 
that  creates  a  new  instance  of  the  class.  Having  classes  allows  a  group 
of  objects  to  be  organized  by  classifications  in  a  class  hierarchy.  In  an 
application  to  manage  a  zoo,  we  might  see  a  class  hierarchy,  as  in  Figure 
2.  The  class  hierarchy  establishes  an  IS  A  relationship.  A  lion  is  a  cat. 
The  class  hierarchy  allows  a  new  property  of  objects — they  inherit  the 
characteristics  of  their  ancestor  classes.  Thus,  if  a  cat  object  holds  a 
field  or  part  "HairColor"  to  record  the  animal's  color,  then  the  definition 
of  the  lion  class  would  not  need  such  a  field  because  it  would  be  inherited 
from  the  cat  class  definition.  Similarly,  any  methods  that  are  defined 
for  an  ancestor  are  inherited  by  a  subclass.  So,  if  there  were  a  method 
that  looked  at  a  whole  class  of  birds,  checking  each  bird  instance  for 
eggs  to  average  the  number  of  eggs  laid  per  season,  it  would  be 
automatically  available  to  the  Eagle  class  and  the  Hawk  class.  The 
developer  would  have  to  expend  no  effort  to  add  it.  Inheritance  is  a 
powerful  tool  for  organizing  a  data  model  and  for  reducing  unwanted 
duplication  of  effort. 

In  a  library  OODBS,  there  might  be  classes  that  represented 
bibliographic  entities,  patrons,  and  services  such  as  being  at  the  bindery. 
Imagining  that  the  system  is  for  a  state  university  library,  the  following 
pieces  of  the  class  hierarchy  might  exist:  classes  for  patrons  (see  Figures 
2  and  3)  and  classes  for  bibliographic  records  (see  Figure  4).  So  far, 
objects  have  been  said  to  contain  data,  but  the  nature  of  it  has  not 
been  related.  Figure  5  shows  a  little  of  what  an  instance  of  the  patron 
class  might  contain.  It  has  some  simple  data  such  as  strings  for  parts 
such  as  names,  but  there  can  be  more  complicated  items.  Some  data 
is  convenient  to  aggregate.  In  the  example,  an  address  is  a  part  of  a 
patron  object,  but  it  is  an  owned  instance  of  an  address  class.  Being 
an  owned  instance  allows  the  patron's  address  both  to  act  as  a  "chunk" 
of  information  with  its  own  name  "PatronAddr"  and  to  have  internal 
structure  that  may  be  used  in  searching  or  organizing  reports  on  patrons. 
The  next  parts  are  even  more  interesting.  The  "Dept"  part  is  a  reference 
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to  a  separate  or  shared  instance  of  the  department  class.  Clearly,  an 
individual  patron  should  not  own  the  information  about  a  university 
department.  That  information  should  stand  alone.  Aggregation  and 
reference  are  examples  of  relationships  between  objects  that  are  strongly 
coupled  or  weakly  coupled.  If  patron  Sally  Jones  is  in  the  Economics 
Department,  deleting  her  object  should  not  remove  any  information 
inherent  to  the  department,  but  it  would  be  quite  all  right  to  also 
automatically  delete  her  owned  address  object.  It  only  pertains  to  Sally, 
while  other  objects  may  well  wish  to  relate  themselves  to  the  Economics 
object  by  reference. 


Figure  2.  Class  hierarchy 


Figure  3.  Patron  subtree 
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Figure  4.  Bibliographic  classes 


Patron 


Name:  Bill  caxton 

IDnum:   444-46-3333 

Dept: 

Patron  Add  r: 


Dept 


Info-Items 


Figure  5.  Patron  class  definition 


Another  useful  kind  of  part  is  a  set  or  collection,  which  allows 
an  element  of  an  object  to  have  variable  size.  A  set  may  contain  either 
owned  instances  of  a  class  or  references.  In  the  library  example,  the 
part  "CheckedOut"  is  a  set  of  references  to  material  that  the  patron 
has  checked  out.  This  part  can  contain  zero  or  more  references  to  the 
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Info-Item  class  of  bibliographic  information.  Each  instance  of  the  Info- 
Item  class  corresponds  to  a  shelflist  card  in  a  manual  library  system. 
A  sophisticated  OODBS  would  allow  a  search  for  the  objects  that 
reference  an  Info-Item.  So  the  checkout  method  could  check  to  make 
sure  no  one  has  this  particular  item  currently  checked  out.  The  "Fines" 
part  is  an  example  of  a  set  of  owned  objects.  It  would  contain  instances 
of  a  class  that  recorded  fines  that  are  due.  If  the  Patron  instance  "Bill 
Caxton"  is  deleted  from  the  system,  all  information  about  his  fines 
would  disappear,  but  information  about  his  department  or  books  he 
might  have  checked  out  will  not  be  lost. 

The  above  example  shows  the  basic  tools  that  OODBSs  offer  for 
modeling  the  real  world.  An  object  may  contain  simple  data,  single 
aggregations,  or  collections  of  aggregations.  It  may  be  related  to  another 
independent  object  by  a  reference  as  a  single  part  or  to  many  other 
objects  by  a  collection  of  references.  All  these  tools  give  the  developer 
of  an  OODBS  the  ability  to  create  a  design  that  more  naturally  captures 
the  relationships  found  in  the  organization.  It  gives  the  developer  a 
much  better  chance  to  get  the  design  right.  A  developer  using  a  relational 
tool  must  discover  the  same  real  relationships,  but  is  faced  with  the 
more  perilous  task  of  translating  the  natural  model  of  the  system  into 
a  set  of  relational  tables.  This  must  be  done  very  carefully  to  avoid 
nasty  design  faults. 

Traditional  computer  systems  built  around  databases  have  always 
had  two  main  divisions.  The  database  has  been  analyzed  and  designed 
looking  at  the  needs  for  long-term  data  storage  and  for  processing. 
Separately,  a  group  of  application  programs  is  written  to  actually  do 
the  needed  processing.  Thus,  traditional  systems  have  always  made  a 
strong  separation  of  data  and  operations.  The  result  has  been  extremely 
high  maintenance  costs.  Every  time  an  error  in  the  original  design  is 
found,  or  when  the  users  want  an  alteration  in  that  design,  massive 
changes  must  be  made.  Some  systems  have  hundreds  of  programs  that 
depend  on  the  database  design  to  work  properly.  Change  the  structure 
of  the  database  and  many  or  all  of  the  programs  must  be  altered  to 
work.  In  an  object-oriented  system,  much  of  this  change  can  be  avoided. 
In  an  OODBS,  the  code  to  manipulate  data  stays  with  the  data.  It  is 
in  one  place  and  is  easier  to  maintain.  If  the  structure  of  data  in  an 
object  changes,  say,  by  having  a  new  part  added,  most  application 
programs  can  completely  ignore  that  structural  change.  They  depend 
on  messages  to  get  data  from  an  object.  Once  the  object  has  changed, 
the  old  messages  will  behave  the  same,  if  they  are  specific.  For  example, 
if  the  message  was  to  a  patron  instance  asking  for  the  patron's  address, 
it  will  still  work  even  if  a  new  part,  birthdate,  has  been  added  after 
"IDnum."  What  object  orientation  does  is  hide  the  details  of  actual 
storage  and  manipulation,  so  no  program  need  depend  on  them.  There 
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is  no  longer  any  public  physical  database  design — it  is  a  private  matter 
known  only  to  the  system  designers.  What  is  public  is  the  existence 
of  the  data  and  how  to  get  at  it. 

Object  orientation  for  databases  offers  much  promise  to  system 
designers  and  users  alike.  It  will  increase  the  productivity  of  the 
computing  staff  so  that  they  may  get  more  work  done  and  have  it  be 
more  effective  and  reduce  the  errors.  Thus,  it  will  improve  the  overall 
quality  of  the  library's  software.  In  the  long  term,  OODBSs  should 
be  easier  to  maintain  and  cost  less.  Being  newer,  they  also  come  along 
with  better  tools  for  user  access  as  well.  They  are  more  likely  to  have 
fancier  and  friendlier  interfaces,  built  on  visual  metaphors  that  users 
will  find  easier  to  like  and  use.  The  developer  will  be  given  tools  to 
create  tailored  interfaces  that  meet  the  exact  needs  and  skills  of  the 
end  users. 

Other  Database  Research  Directions 

Database  researchers  have  not  all  been  working  at  object  orientation. 
Research  has  also  taken  other  directions  for  its  explorations.  This  section 
outlines  some  likely  areas  of  work  in  the  near  future. 

An  issue  that  is  a  central  facet  of  OODBSs  is  the  marriage  of  data 
and  operations.  Some  database  researchers  are  not  interested  in 
accomplishing  a  union.  They  prefer  to  concentrate  on  the  data  modeling 
possibilities  of  systems  that  are  richer  than  the  relational  paradigm, 
but  leave  the  operations  out.  Most  of  this  sort  of  research  falls  under 
the  heading  of  the  nested  relational  model,  which  provides  a  relational 
system  in  which  the  restriction  of  allowing  only  simple  or  atomic  values 
in  the  columns  of  a  table  is  relaxed.  A  relation  with  only  atomic  values 
is  said  to  be  in  first  normal  form  and,  thus,  a  nested  relational  table 
is  said  to  be  a  non-first  normal  form  relation.  The  nested  relational 
approach,  like  the  object-oriented  one,  permits  a  designer  to  capture 
more  naturally  the  complexity  of  a  real-world  problem  domain.  When 
sets  of  some  entity  occur,  for  example,  when  a  patron  has  multiple 
fines,  the  table  holding  data  about  the  person  may  have  a  column  that 
contains  a  subtable  of  all  the  fine  data.  In  a  traditional  relational  system, 
a  second  table  would  have  to  be  created  and  a  data  relationship  on 
some  unique  identifier  established  to  bridge  the  person  table  and  the 
fine  record  table.  Retrieving  the  natural  set  of  fines  would  require  a 
join  operation  that  can  be  expensive.  In  the  nested  relational  version, 
having  a  person's  record  would  automatically  provide  access  to  all  the 
fines  in  the  subtable.  This  is  also  true  of  an  object-oriented  version 
in  which  a  set  of  fine  objects  would  be  inside  the  person  object. 

The  ability  to  nest  a  table  or  to  include  a  set  of  objects  exactly 
parallels  what  a  hierarchical  database  could  do.  However,  current 
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database  researchers  would  probably  describe  these  older,  hierarchical, 
and  network  databases  as  obsolete.  They  are  out  of  date  because  they 
lack  the  firm  theoretical  foundation  of  relational  systems,  they  have 
inflexible  user  interfaces  and  rigid  physical  storage  structures,  and  they 
are  less  maintainable  than  an  OODBS. 

A  major  selling  point  of  relational  systems  is  their  flexibility.  A 
database  built  on  a  relational  foundation  does  not  predetermine  the 
relationships  between  data  elements.  The  system  is  able  to  exploit  all 
the  relationships  actually  in  the  data.  The  lack  of  this  flexibility  is 
the  subject  of  significant  criticism  of  the  object-oriented  model  by  the 
advocates  of  the  relational  model.  But  the  relational  researchers  are  not 
idle;  they  see  ways  to  expand  the  usefulness  of  their  systems  into  a 
"post-relational  model."  The  central  aspect  of  post-relational  systems 
is  extensibility.  They,  like  the  OODBSs,  will  offer  the  database  designer 
better  data  modeling  by  allowing  nested  relations  (these  researchers  are 
concerned  with  storing  and  manipulating  complex  entities  such  as 
graphics),  better  management  of  queries  by  allowing  them  to  be  stored 
and  compiled,  and  better  overall  behavior  by  allowing  procedures  to 
be  stored  in  the  database.  The  designer  will  also  have  more  control 
over  the  structure  and  internal  working  of  the  implemented  system. 
The  post-relational  researchers  hope  to  provide  a  significant  leap  in 
power  to  the  designers  so  that  each  system  will  perform  near  its  optimum 
expected  behavior  (Stonebreaker  &  Rowe,  1986;  Carey  &  DeWitt,  1985; 
Lindsay,  1987). 

A  related  research  area  is  the  work  being  done  to  put  more 
intelligence  into  database  systems.  Some  of  this  work  is  a  bit  blurred 
with  the  extension  of  relational  systems,  because  one  possible  way  to 
extend  a  relational  database  system  is  to  give  it  intelligence.  However, 
some  of  the  basic  research  in  this  area  predates  the  call  for  relational 
extension.  There  are  several  components  to  this  research.  The  oldest 
is  the  desire  to  improve  the  query  interface  of  existing  relational  systems, 
which  usually  means  combining  a  logic  programming  language  such 
as  PROLOG  with  a  database.  The  basic  idea  here  is  to  give  more  power 
to  query  writers.  A  second  area  of  research  in  this  subfield  examines 
how  to  combine  expert  systems  with  databases.  Expert  systems  are  a 
commercial  outgrowth  of  research  in  artificial  intelligence.  They 
attempt  to  capture  the  knowledge  of  expert  humans  in  a  computer- 
based  system  that  can  perform  at  levels  near  the  human  expert.  They 
usually  involve  a  store  of  expert  experience  in  a  knowledge-base  and 
an  inference  engine  for  handling  outside  demands  by  processing  the 
knowledge  base.  Expert  systems  have  been  built  to  do  medical  diagnosis, 
repair  diesel  engines,  and  plan  the  installation  of  all  new  VAX  computers. 
Combining  expert  systems  with  regular  databases  appears  to  be  a  quite 
fruitful  area  for  future  development. 
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Yet  another  area  of  research  is  putting  knowledge  and  rules  into 
databases.  In  an  unsophisticated  fashion,  this  has  been  going  on  for 
some  time  in  the  form  of  data  entry  checking.  The  central  idea  is  to 
make  the  database  system  more  knowledgeable  about  the  world  it  lives 
in.  So  besides  just  holding  the  data  for  a  business,  the  system  would 
hold  rules  about  that  world.  For  example,  if  the  system  were  in  a  library, 
it  would  be  told  the  rules  of  catalog  numbers.  When  a  book  is  cataloged, 
it  would  warn  of  an  incorrect  call  number  that  is  being  created  in  the 
shelflist  database.  Or  it  would  know  that  most  undergraduates  cannot 
carry  more  than  twenty  or  thirty  books  and  would  object  to  a  single 
person  attempting  to  check  out  that  many.  The  database  system  would 
be  given  rules  about  how  a  library  operates  that  would  constitute  meta- 
data about  the  library  world.  Meta-data  allows  a  database  system  to 
work  more  reasonably  with  fewer  errors  and  to  help  monitor  all  activity 
that  is  captured  by  information  stored  in  the  database  (Brodie,  1988; 
Hallaire,  1981;  Ceri  et  al.,  1986). 

Once  database  systems  have  been  extended  and  smartened,  wouldn't 
they  be  expected  to  start  doing  things  on  their  own?  Some  database 
researchers  do  expect  it.  They  are  looking  forward  to  "active  databases" 
that  will  have  a  rule  knowledge  base  that  gives  them  the  meta-data 
about  their  environment.  So  when  some  event  occurs  or  some  set  of 
predetermined  circumstances  comes  about,  the  active  database  will  do 
something.  In  the  library  system  mentioned  above,  if  a  book  that  is 
"popular"  according  to  check-out  statistics  is  removed  from  the  shelflist, 
the  database  could  automatically  generate  a  purchase  recommendation 
that  would  be  brought  to  the  attention  of  the  acquisitions  staff.  The 
active  database  will  send  a  mail  message  to  an  acquisitions  librarian 
and  enter  the  recommendation  in  a  special  log.  This  is  an  extension 
of  an  older  database  idea  called  triggers.  Events  or  data  coming  through 
cause  the  system  to  take  some  action.  Often  it  is  to  do  some  logging 
or  bookkeeping  that  is  required.  But  once  procedures  are  put  into  an 
extended  database  system,  the  active  database  can  do  anything  the 
computer  can  do  (Dayal  et  al.,  1988). 


CONCLUSION 

In  a  number  of  ways,  the  library  world  has  been  a  leader  in  what 
is  popularly  referred  to  as  the  information  revolution.  But  because  of 
limited  resources,  it  is  not  likely  that  libraries  and  librarians  can  continue 
to  lead  the  revolution.  Limited  budgets  just  do  not  permit  it.  However, 
it  seems  clear  that  there  are  two  figurative  Bastilles  that  must  be  stormed 
before  the  revolution  can  be  won. 
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The  first  is  technological.  Many  librarians  seem  to  think  that 
installing  a  personal  computer,  even  one  with  CD-ROM  databases,  is 
a  great  step  forward  towards  winning  the  fight.  To  this  author's  view, 
putting  something  as  primitive  as  a  DOS-based  machine  into  the  hands 
of  patrons  is  not  necessarily  a  step  forward.  The  technological  gap 
must  be  bridged  to  win  the  revolution.  The  gap  is  not  just  the  absence 
of  computers  in  libraries;  it  is  also  the  existence  of  obsolete  and  hard- 
to-use  computers  getting  in  the  patron's  way.  The  Bastille  that  must 
be  stormed  is  unfriendly  machines  and  programs.  Easy  (meaning 
REALLY  easy  here,  so  that  the  computer  illiterate  off  the  street  can 
use  it)  and  intelligent  access  to  the  information  is  a  must  (Baecker  & 
Buxton,  1987;  Barstow  et  al.,  1984;  Smith  8c  Green,  1980;  Hartson  & 
Hix,  1989).  What  is  the  good  of  a  revolution  if  only  a  handful  of  citizens 
may  partake?  To  win  an  information  revolution,  access  to  information 
must  be  made  as  easy  as  using  the  manual  card  catalog.  The  tools 
provided  will  have  to  be  intelligent,  so  they  will  have  to  be  based  on 
systems  that  are  built  by  knowledge  engineers  using  the  best  fruits  of 
artificial  intelligence  research.  The  technology  advances  needed  are  quite 
significant.  In  many  areas,  even  a  start  has  not  been  made  and,  where 
a  beginning  exists,  there  is  often  frustration  with  the  difficulty  of  the 
task  ahead  (Blair  &  Maron,  1985;  Witten  &  Bramwell,  1985). 

The  second  area  needing  aggressive  attention  is  economical.  What 
good  is  access  to  billions  of  pieces  of  data  and  the  use  of  fancy  computer 
and  AI  tools,  if  the  patron  can't  afford  to  use  them?  Presently,  any 
area  librarian  can  search  a  national  citation  resource  on  Dialog  and 
in  less  than  five  minutes  can  find  hundreds  of  citations  that  might 
be  useful  in  researching  a  patron's  paper.  But  the  patron  can't  afford 
the  service  because  searching  Dialog  databases  for  a  few  minutes  can 
cost  hundreds  of  dollars.  There  can  be  no  true  information  revolution 
until  the  average  citizen  can  have  affordable  access  to  the  information 
needed.  It  must  be  as  affordable  as  books  are  in  present  day  libraries. 
For  the  near  future,  this  Bastille  is  even  tougher  than  the  technological 
one.  It  is  fair  to  say  that  achieving  the  goal  of  easy  access  will  mean 
increasing  the  revolution's  cost,  because  it  will  take  fancier  equipment 
and  much  more  thinking  and  programming. 

Object-oriented  technology  would  be  quite  beneficial  in  many 
applications  in  the  library  world.  Systems  built  around  an  OODBS  would 
be  very  cost-effective.  OODBSs  could  help  support  the  technological 
advances  needed  to  bring  about  a  true  information  revolution.  However, 
will  we  see  these  advances  soon?  This  is  very  much  like  the  question, 
"When  do  you  need  the  SST?"  Remember  that  when  that  question 
was  asked  in  the  United  States,  citizens  answered,  "Not  now,  maybe 
not  ever;  we  can't  afford  it."  When  the  question  was  asked  in  Europe, 
they  answered,  "Well,  we'd  like  to  build  the  SST  like  the  Americans 
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envisioned,  but  we  don't  have  the  technology  to  do  it,  so  we'll  build 
a  Concorde  instead  and  act  like  we  built  the  SST."  (Recall  that  the 
Concorde  flies  about  500  mph  slower  than  the  planned  SST  and  carries 
about  100  fewer  passengers.)  One  guess  is  that,  for  a  while,  the  answer 
to  "Do  we  need  OODBSs  in  libraries?"  will  be  "Not  now,  maybe  later." 
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A  Realistic  Blue  Sky  System 


INTRODUCTION 

A  conference  is  just  an  admission 
that  you  want  somebody  to  join  you  in 
your  troubles. 

—Will  Rogers 

This  particular  conference  has  a  long  history  of  providing  the 
opportunity  for  librarians  to  join  together  in  their  troubles/concerns. 
In  this  spirit,  the  author  would  like  to  share,  if  not  her  troubles,  at 
least  some  concerns. 

My  first  experience  with  computers  was  in  1974  when,  on  the  second 
day  of  employment,  my  new  boss  said,  "Oh,  by  the  way,  that  thing 
sitting  in  the  middle  of  the  cataloging  department  is  our  new  CLSI 
computerized  circulation  system.  It  was  delivered  last  week.  It  will  be 
your  responsibility  to  load  the  data  and  get  the  system  up  and  running." 
I  had  to  learn  fast,  and  have  been  learning  ever  since;  in  the  process, 
I  have  become  somewhat  of  an  expert  on  the  relative  merits  of  various 
database  management  systems,  or  at  least  qualified  to  discuss  what  you 
may  want  and/or  need  a  system  to  do.  It  is  important  to  have  computer 
people  around  to  help  evaluate  the  technical  aspects  of  the  hardware 
and  system  software.  They  may  also  be  needed  to  run  the  system  once 
it  has  been  selected.  But,  in  the  final  analysis,  if  the  applications 
programs  do  not  support  the  activities  of  your  library  and  do  what 
you  want  done,  it  really  doesn't  matter  if  you  have  a  Cray  supercomputer 
with  the  latest  operating  and  database  management  systems  or  a  Brand 
X  microcomputer  from  a  mail  order  house. 

So,  how  do  you  know  if  a  system  is  the  right  one  for  your  library? 
How  much  power  is  enough?  It  is  a  long,  detailed  and  often  laborious 
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process  to  arrive  at  an  answer.  Among  other  things,  it  involves  system 
specification,  vendor  scrutiny,  and  vendor  selection.  The  first  step,  system 
specification,  is  the  focus  of  this  paper. 


SYSTEM  SPECIFICATIONS 

There  are  several  ways  to  write  system  specifications.  The  safest 
way  to  be  sure  that  a  system  will  meet  your  library's  needs  is  to  involve 
your  staff  and  to  write  the  specifications  yourself.  This  will  involve 
three  steps: 

1.  Analyze  and  identify  areas  in  critical  need  of  change/support; 

2.  Blue  sky;  and 

3.  Compromise,  which  means  prioritize  or  rank  those  areas  identified 
in  step  1. 

Analyze  and  Identify  Critical  Areas 

The  first  step  in  this  process  is  to  analyze  what  you  are  currently 
doing  and  why.  That,  you  may  say,  is  easy.  There  are  written  policies 
and  procedure  manuals  for  your  library,  and  each  department  has  its 
own  manual  covering  their  specific  procedures.  But  when  was  the  last 
time  those  manuals  were  updated?  How  many  employees  have  come 
and  gone  with  their  own  interpretations  of  what  was  written  down? 
How  many  staff  have  been  trained  by  someone  who  had  their  own 
interpretation  of  what  was  meant  and  who,  in  turn,  trained  someone 
else  and  further  distorted  the  original  intent?  It  is  very  easy  to  say  that 
you  know  what  is  being  done  because  it  is  written  down,  but  it  is 
quite  possible  that  many  things  are  being  done  that  are  not  in  the 
manual.  It  is  also  likely  that  some  things  which  are  done  are  a  very 
distorted  version  of  what  is  in  the  manual.  This  is  particularly  true 
if  the  manual  is  more  than  six  months  old. 

How  do  you  find  out  what  is  really  going  on  in  your  library?  There 
are  a  number  of  ways: 

— Ask  your  staff  to  write  down  exactly  how  they  do  their  jobs. 
— Ask  them  to  do  this  without  reference  to  the  current  manual,  if  there 

is  one. 
— Ask  them  to  explain  why  they  do  what  they  do. 

You  will  gain  some  valuable  information  this  way.  You  will  not 
only  discover  what  is  being  done  but  also  why  it  is  being  done  that 
way.  You  will  also  discover  just  exactly  how  much  your  staff  understands 
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the  mission,  goals,  and  objectives  of  the  library,  and  you  will  be  amazed 
at  the  number  of  procedures  which  exist  in  a  vacuum  because  your 
staff  has  little  or  no  idea  of  the  overall  picture  of  your  operation. 

At  this  stage,  you  also  need  to  identify  and  examine  your  priorities. 
What  things  are  working  smoothly  in  your  manual  or  current  automated 
systems?  What  things  are  not?  For  instance,  is  your  current  circulation 
system  operating  smoothly  or  is  it  a  chaotic  mess?  If  it  is  working 
smoothly,  you  may  not  want  to  give  it  a  high  priority  for  replacement. 
Is  your  card,  microfiche  or  book  catalog  simply  beyond  redemption? 
If  so,  this  may  be  your  high  priority  problem,  but  if  it  is  working 
smoothly,  it  will  receive  a  lower  priority  rating.  Is  your  acquisition 
process  in  such  disarray  that  you  have  little  control  over  funds?  This 
may  or  may  not  be  a  serious  problem  in  your  library.  Is  the  scheduling 
at  the  reference  or  circulation  desk  getting  to  be  so  complicated  that 
it  takes  up  half  or  more  of  one  person's  time?  This  is  quite  often  a 
problem  in  a  large  library.  Is  the  budget  totally  out  of  control  because 
you  have  a  poor  manual  procedure  for  monitoring  it?  Or,  do  you  have 
to  wait  three  months  for  a  computer  report  from  a  city,  state  or  university 
agency  before  you  know  where  you  stand?  For  many  libraries,  this  is 
a  critical  issue.  Are  you  far  behind  in  writing  quarterly,  annual  and 
other  reports?  Some  libraries  do  not  stress  such  reports  as  heavily  as 
others,  so  this  may  or  may  not  be  a  problem  for  you. 

Do  you  have  one  or  more  building  projects  going  on  either  at  your 
main  or  branch  libraries?  It  is  reasonable  to  expect  the  builder  to  keep 
you  updated,  but  they  don't  always  do  so.  Do  the  patrons  of  your 
bookmobiles  or  other  outreach  type  services  get  short  shrift  because 
it  is  impossible  to  get  them  materials  in  a  timely  fashion  because  there 
is  no  en  route  access  to  the  library's  holdings?  There  are  ways  to  handle 
this  problem,  including  portable  radios,  terminals,  or  cellular  phones. 
Do  fund-raising  activities  need  more  detailed  and  up-to-date  records 
than  a  manual  system  allows?  This  can  be  a  critical  need,  especially 
if  you  rely  heavily  on  endowed  funds  for  operating  expenses  or  to  finance 
capital  expenditures  such  as  a  new  automated  system.  Is  the  maintenance 
of  your  main  and/or  branch  libraries  a  problem  because  of  inadequate 
repair  records?  In  my  experience,  this  is  a  very  thorny  problem,  especially 
with  bookmobiles.  Are  some  functions  already  automated?  Decide  what 
you  are  going  to  do  with  those  systems  as  you  look  toward  a  new  one. 
Are  they  going  to  be  phased  out,  linked,  or  integrated?  These  are  just 
a  few  of  the  questions  which  you  need  to  ask  yourselves. 

The  next  step,  after  identifying  the  problems  which  your  library 
faces,  is  to  identify  those  which  automation  can  solve,  which  would 
be  better  solved  by  some  other  method,  which  should  be  part  of  an 
overall  system,  and  which  would  be  better  suited  as  independent,  stand- 
alone systems.  For  instance,  the  monitoring  of  the  building  project(s) 
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might  be  better  handled  by  an  independent  project  management  package 
which  runs  on  a  microcomputer.  On  the  other  hand,  budget  monitoring 
could  and  probably  should  run  on  your  acquisitions  system,  since  a 
sizeable  chunk  of  the  budget  is  for  materials.  The  materials  budget 
will  be  controlled  by  the  system,  so  why  not  let  it  control  your  whole 
budget? 

"Blue  Sky"  and  the  Online  Public  Access  Function 

In  the  process  of  discovering  and  identifying  your  priorities,  you 
can  begin  the  second  step,  which  is  to  "blue  sky."  Blue  sky  simply 
means  to  think  about  an  ideal  system.  This  step  has  little  relation  to 
what  is  really  possible,  affordable,  or  any  other  practical  consideration. 
It  is  important  at  this  stage  to  constantly  caution  your  staff  that  the 
ideal  system  does  not  exist;  however,  it  is  equally  important  to  decide 
what  is  desirable  before  looking  at  systems  and  hardware.  With  the 
rapid  developments  in  the  field  of  automation,  many  things  which  were 
not  technologically  possible  five  years  ago  are  now  feasible — so  go  ahead 
and  dream.  By  the  time  your  specifications  are  finished  and  money 
is  available,  it  is  quite  probable  that  at  least  some  of  the  things  which 
are  not  possible  now  will  be  then. 

One  note  of  caution  here.  At  this  point,  cost  or  size  of  the  computer 
to  support  your  blue  sky  system  should  not  be  considered.  Not  everything 
you  want  will  be  affordable  or  technologically  feasible,  but  you  should 
clearly  identify  and  state  on  paper  what  is  desirable  before  you  begin 
the  compromise  process  which  reality  will  impose  upon  you.  Besides, 
you  might  be  surprised  at  how  much  of  what  you  want  is  feasible  and 
affordable  right  now.  (As  a  good  friend  of  mine  always  says,  "If  you 
don't  ask,  you  surely  won't  get  it!") 

How  is  "blue  skying"  done  for  an  online  public  access  function 
or  automated  catalog?  The  first  thing  to  consider  is  the  patron's  point 
of  view.  After  all,  the  patron  is  the  one  for  whom  you  are  designing 
it.  It  would  be  most  helpful  if  you  could  include  some  of  your  more 
knowledgeable  patrons  in  this  effort.  If  that  is  not  possible,  do  the 
best  you  can  to  forget  everything  you  know  about  library  practices, 
theory,  automated  systems,  and  searching  techniques.  (For  other 
functions  such  as  acquisitions,  you  can  "blue  sky"  from  the  librarian/ 
staff  point  of  view  because  these  functions  will  be  used  primarily  by 
staff.  However,  do  not  forget  that  in  an  integrated  environment  some 
of  the  information  from  those  functions  will  be  available  to  patrons.) 
The  one  thing  you  must  keep  firmly  in  mind  is  that  patrons  want 
what  they  want,  when  they  want  it,  whether  or  not  they  know  what 
it  is  they  want.  There  are  at  least  seven  areas  which  you  will  need 
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to  consider:  search  strategies,  extended  features,  circulation  status, 
gateway  services/networking,  downloading  data,  user-friendly  search 
levels,  and  remote  access. 

Search  Strategies 

From  the  patron's  point  of  view,  the  first  thing  you  want  is  a  system 
which  will  allow  the  database  to  be  searched  for  a  known  or  unknown 
item.  This  may  sound  relatively  simple  but  it  is  actually  fairly  complex. 
What  is  known  about  the  item — author,  title,  subject,  call  number,  or 
all  or  any  combination  of  this  information?  At  a  minimum,  the  system 
should  allow  searching  on  each  of  these  fields  or  any  combination  of 
them  such  as  author/title,  author/author,  author/subject,  title/subject 
or  subject/subject.  But  what  if,  for  instance,  a  patron  wants  information 
about  William  Shakespeare  but  does  not  understand  how  the  library 
has  handled  it?  Is  he  an  author,  title,  or  subject?  A  patron  might  very 
well  assume  that  an  author  search  should  be  used  because,  after  all, 
Shakespeare  is  an  author.  If  a  patron  does  an  author  search,  he  or  she 
will  find  all  the  materials  by  Shakespeare  and  may  conclude  that  the 
library  has  no  material  about  Shakespeare.  A  search  by  title  will  find 
some  things  about  him,  but  only  a  full  subject  search  will  find  the 
wealth  of  material  which  the  library  owns.  A  universal  search  function 
would  allow  the  patron  to  enter  the  information  available  without 
specifying  whether  it  is  an  author,  title,  or  subject.  With  data  entry 
occurring  only  one  time,  the  system  would  retrieve  all  items  which 
contain  the  search  term(s)  and  tell  how  many  subjects,  authors,  and 
titles  it  has  found.  Patrons  could  then  choose  to  look  at  one  or  all 
of  the  categories  and  their  attached  titles.  This  is  particularly  important 
when  dealing  with  corporate  bodies  and  conferences  which  may  or  may 
not  be  treated  as  authors  by  the  library,  and  which  many  users  would 
never  think  of  as  authors.  Another  consideration  is  that  patrons  normally 
do  not  think  of  people's  names  in  surname-forename-initial  order.  A 
system  which  allows  author  searching  in  only  that  sequence  is  not  very 
responsive  to  the  user's  needs.  The  user  should  not  have  to  know  that 
the  author's  surname  is  a  double  or  hyphenated  one,  or  that  the  official 
entry  for  T.  S.  Eliot  was  at  one  time  Thomas  Stearns  Eliot.  After  all, 
most  patrons  are  not  versed  in  the  1949  LC  Rules,  AACR1  or  AACR2, 
some  or  all  of  which  may  be  represented  in  your  library's  database. 
They  should  be  able  to  type  in  a  subject  as  they  think  of  it,  i.e.,  ESP, 
not  extrasensory  perception;  or  Civil  War,  not  United  States — History — 
Civil  War,  1861-1865. 

What  if  a  patron  knows  the  exact  author  or  title  of  an  item,  e.g., 
East  of  Eden?  The  patron  does  not  want  the  system  to  retrieve  all  the 
titles  which  have  those  words  somewhere  in  the  title.  Only  that  book, 
its  location,  and  its  availability  status  are  wanted.  On  the  other  hand, 
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what  if  only  part  of  an  author/title/subject  is  known?  A  system  should 
be  able  to  retrieve  that  for  the  patron,  too.  For  instance,  entering  "The 
Marketing  of  Alaska"  should  retrieve  the  title  "Lost  Frontier:  The 
Marketing  of  Alaska";  entering  "Public  Prayer  and  the  Supreme  Court" 
should  retrieve  the  title  "The  Supreme  Court  and  Public  Prayer";  and 
entering  "Henri  Rousseau"  should  retrieve,  among  other  titles,  "Portrait 
of  a  Primitive:  The  Art  of  Henri  Rousseau."  In  other  words,  both 
adjacency  searching  and  keyword  searching  are  needed.  Also  necessary 
is  Boolean  searching  which  is  easy  enough  for  the  computer-phobic 
patron.  Patrons  may  want  to  search  titles  which  are  alternate  titles  or 
located  in  contents  notes  or  added  entries.  Furthermore,  they  may  want 
to  search  subjects  using  their  own  terminology  and  logic,  not  just  the 
Library  of  Congress's. 

Extended  Features 

What  extended  features  would  patrons  like  to  have  in  an  online 
catalog?  These  items  are  ones  which  might  be  considered  curlicues  by 
some  and  downright  essential  by  others.  For  instance,  a  poor  speller 
might  like  a  system  with  a  spelling  checker,  which  would  open  a  window 
next  to  the  unknown  word  and  ask  if  perhaps  it  is  misspelled.  Another 
patron  might  want  a  system  which  uses  a  microcomputer  or  workstation 
with  a  color  screen  and  at  least  an  EGA  (Extended  Graphics  Adapter) 
monitor  so  that  all  languages  can  be  displayed  on  one  terminal  screen. 
Another  desirable  feature  could  be  a  template  that  allows  a  patron  to 
enter  a  search  in  any  alphabet,  whether  roman  or  non-roman.  Additional 
features  could  limit  searches  by  language,  date  of  publication,  kind  of 
material,  level  of  material,  and  place  of  publication  and/or  holding 
library,  at  a  minimum.  You  may  very  well  think  of  others. 

Circulation  Status 

Circulation  status  is  a  must  in  an  online  catalog.  Patrons  need 
to  know  before  they  go  to  the  shelf  that  the  material  is  there.  They 
do  not  want  to  go  looking  for  material  which  is  checked  out,  on  order, 
lost,  missing,  received  but  not  processed,  etc.  The  system  should  be 
able  to  automatically  place  a  hold  on  the  title  if  it  is  checked  out, 
on  order,  or  received  but  not  processed;  or  to  place  an  interlibrary  loan 
request  if  the  material  is  lost  or  missing. 

Gateway  Services /Networking 

If  the  library  does  not  own  the  item  or  have  material  on  the  subject 
a  patron  wants,  you  might  like  to  continue  the  search  onto  other  libraries' 
databases  and/or  onto  OCLC  or  other  commercial  databases,  or  any 
CD-ROM  databases  which  are  available  locally,  without  changing  or 
modifying  the  search  strategy  which  was  used  on  the  local  system. 
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Downloading  Data 

Once  items  have  been  searched  for  and  identified,  and  citations 
and/or  abstracts  retrieved,  the  system  should,  at  the  very  least,  be  able 
to  print  that  composite  list.  Even  better  would  be  the  capability  to 
download  from  all  the  various  databases  onto  a  floppy  disk  which  users 
could  take  with  them  to  upload  onto  their  own  databases.  Better  yet, 
an  Integrated  Scholarly  Information  System  would  allow  for  the 
manipulation  and  integration  of  data  from  several  databases  on  that 
floppy,  rather  than  just  downloading  ASCII  files  and  then  leaving  it 
to  the  users  to  figure  out  how  to  manipulate/integrate  them  on  their 
own. 

User-Friendly  Search  Levels 

Patrons  want  a  system  which  will  hold  their  hands  the  first  few 
times  they  use  it  so  that  they  are  not  panicked  by  it  nor  made  to  feel 
stupid.  A  built-in  tutorial  would  be  nice.  But  if,  after  the  first  few  uses, 
patrons  feel  comfortable  with  it  and  do  not  want  all  that  handholding, 
they  would  like  to  be  able  to  shortcut  some  of  the  long-winded 
instructions  and  fly  on  their  own  (to  a  limited  extent).  The  "help" 
button  should  still  be  available  at  any  time  and  anywhere  in  the  system. 
Patrons  may  even  become  so  expert  at  using  the  system  that  they  really 
don't  want  any  help  at  all — just  a  blank  screen  on  which  they  can 
enter  a  search  strategy.  The  system  should  be  consistent  in  instruction/ 
procedure,  i.e.,  the  same  keys  will  always  serve  the  same  functions.  Color- 
coded  keys  can  also  help  the  patron  know  easily  and  quickly  which 
one  to  push.  The  bottom  of  the  screen  might  have  icons  which  show 
which  options  are  available — red,  green,  yellow,  etc.  with  their 
corresponding  definitions.  The  top  of  the  screen  might  have  a  status 
line  that  tells  patrons  what  information  they  entered  into  the  system 
and  what  the  system  is  currently  retrieving.  The  information  could  be 
displayed  under  the  patron's  search  term  even  though  the  Library  uses 
another  term.  This  would  be  informational  only,  as  the  system  would 
have  already  retrieved  the  relevant  citations.  The  system  should  also 
allow  the  patrons  to  look  back  at  their  searches  at  any  time  in  a  particular 
session  at  the  terminal  and  see  their  results.  Scholars  who  have  been 
given  the  authorization  could  have  the  system  keep  historical  records 
of  their  searches  so  that,  when  they  access  the  system  the  next  time, 
they  can  verify  online  what  they  have  already  searched  and  view  the 
results  rather  then  relying  on  memory  or  jotted  notes.  Other  patrons 
could  enter  areas  of  research/subject  interest  and  have  the  system  tell 
them  every  time  they  log  on  what  new  materials  have  been  added  in 
these  fields  since  they  last  used  the  system. 
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Remote  Access 

What  kind  of  access  is  appropriate  or  needed?  Patrons  might  want 
to  access  the  system  from  home,  office,  car,  boat,  or  airplane.  However, 
personal  terminals  may  not  have  the  capabilities  of  graphics  and  color 
that  the  dedicated  terminals  do.  So  some  kind  of  screen  display  is  needed 
which  will  make  sense  on  a  personal  computer.  A  nondedicated  terminal 
may  not  be  able  to  display  the  ALA  character  set  or  non-roman  characters; 
thus,  some  kind  of  transliteration  or  translation  table  should  be  included 
so  that  patrons  don't  get  gibberish  on  their  screens. 

Other  Factors 

The  list  of  considerations  may  seem  practically  endless.  Only  a 
few  of  the  things  which  an  online  public  access  function  might/should 
do  have  been  identified.  Priorities  and  specifications  need  to  be  thought 
out  for  every  function  which  you  want  supported  by  your  system.  (For 
instance,  if  you  want  an  acquisitions  function,  you  will  need  to  proceed 
through  this  same  process.)  When  the  process  is  completed  and  the 
specifications  are  written  down  for  each  function  that  you  want  in  a 
system,  the  result  will  be  a  document  which  represents  your  wildest 
dreams  come  true. 

Compromise 

The  final  step  in  the  process  of  specifying  a  system  is  to  be  realistic. 
This  is  actually  a  three-step  process. 

1.  Look  at  any  local  requirements.  Your  library's  parent  body — be  it 
city,  campus,  or  company — may  be  committed  to  one  particular  brand 
of  computer.  Or,  they  may  be  committed  to  a  particular  operating 
environment  such  as  Unix.  If  so,  you  must  add  that  requirement 
to  your  specifications.  This  may  or  may  not  severely  limit  the  systems 
which  you  can  consider. 

2.  Look  at  what  systems  are  available.  There  are  several  ways  to  examine 
systems.  One  is  to  go  to  ALA's  annual  and  midwinter  conferences, 
state  library  conferences,  or  other  conferences  where  automation  will 
be  featured  in  the  exhibit  area.  Another  is  to  go  to  nearby  libraries 
which  have  systems  and  see  how  the  individual  system  works  there. 
A  third  alternative  is  to  ask  vendors  to  visit  your  site  and  do  a 
demonstration.  However  you  go  about  it,  you  will  want  to  evaluate 
the  demonstration  system  in  light  of  the  priorities  of  the  blue  sky 
specifications  you  have  drawn  up.  It  will  become  apparent  in  a  fairly 
short  amount  of  time  which  specifications  will  need  to  be  modified 
in  the  cold  light  of  reality.  You  should  now  examine  your  list  of 
priorities  and  specifications  and  identify   those  items  which  are 
mandatory  and  those  which  are  optional. 
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3.  Weigh  cost,  needs,  and  budget  against  your  specifications.  If  a 
microcomputer  is  all  you  can  afford  and  your  library  is  small  enough 
to  be  supported  by  a  micro  system,  then  you  will  need  to  do  an 
in-depth  evaluation  of  those  systems.  If,  on  the  other  hand,  your 
library  can  afford  a  mainframe  and  you  need  its  power,  then  you 
will  need  to  evaluate  those  systems  in  light  of  your  specifications. 
If  a  minicomputer-based  system  is  what  you  can  reasonably  afford, 
you  will  need  to  visit  with  those  vendors  which  come  closest  to  meeting 
your  specifications.  You  may  find  that,  even  though  you  thought 
you  needed  a  mainframe  system,  many  of  your  specifications  which 
are  mandatory  can  be  met  by  a  minicomputer-based  system.  It  may 
be  that  only  a  few  of  those  items  labeled  mandatory  or  optional 
will  be  unsupported.  Unsupported  specifications  need  to  be  looked 
at  very  closely  to  determine  if  they  are  absolutely  essential.  A  move 
to  the  next  level  of  computing,  i.e.,  a  mainframe,  may  not  be  worth 
the  increased  cost  and  complexity  of  system  operation.  By  the  same 
token,  a  micro-  or  supermicro-based  system  may  meet  the  majority 
of  your  mandatory  needs.  You  will  need  to  carefully  evaluate  whether 
it  is  worth  the  cost  to  buy  the  mini-based  system  with  its  more  complex 
operating  environment,  or  whether  it  is  more  cost-effective  to  select 
a  cheaper  system.  Of  course,  there  is  always  the  possibility  that  you 
need  a  mini-  or  mainframe-based  system  and  you  simply  cannot  afford 
it.  This  is  perhaps  the  hardest  decision  of  all:  Do  you  compromise 
and  take  what  you  can  afford,  knowing  it  does  not  meet  your  needs? 
If  you  do  this,  you  should  carefully  evaluate  what  the  system  will 
do  and  what  you  will  ask  it  to  do.  The  other  possibility  is  that  you 
will  delay  acquisition  of  a  system  until  you  can  get  one  that  more 
closely  meets  your  needs.  That  is  not  a  question  which  can  be  answered 
easily.  You  must  consider  the  local  situation,  the  politics  involved, 
and  the  likelihood  of  future  financing. 

One  word  of  warning  here.  Assuming  that  you  have  a  reasonable 
financial  latitude,  you  should  still  keep  in  mind  that  some  things  truly 
are  mandatory  and  are  worth  the  cost.  Do  not  compromise  your  system 
to  the  point  that  it  does  not  meet  your  needs.  On  the  other  hand,  do 
not  insist  on  something  which  is  of  marginal  value.  Only  you  can  make 
that  decision  within  the  context  of  your  local  environment.  What  is 
marginal  to  one  library  may  be  absolutely  essential  to  another.  For 
instance,  in  my  library  it  is  considered  essential  that  the  system  have 
the  capability  for  an  universal  search.  For  your  library  and  its  patrons, 
you  might  consider  that  a  nice  feature  but  not  worth  the  money  which 
it  will  take  to  acquire  a  mainframe  to  support  it.  It  is  also  considered 


A  REALISTIC  BLUE  SKY  SYSTEM  39 


essential  at  my  library  that  the  ALA  character  set  and  non-roman 
alphabets  be  displayed  for  patron  use.  It  might  be  that  in  your  library 
that  not  only  is  this  not  desirable,  but  it  might  even  be  a  disservice. 


CONCLUSION 

At  this  point,  my  answer  to  the  question,  "How  much  power  is 
enough?"  is:  It  depends.  It  depends  on  what  you  want  your  system 
to  do;  what  external  factors  there  are  in  your  local  situation,  such  as 
commitment  to  a  particular  brand  of  computer;  what  you  can  afford 
and  how  much  you  are  willing  to  compromise  to  come  within  your 
budget;  what  your  time  frame  for  installation  is;  what  bibliographic 
information  or  other  data  is  in  your  database;  what  clientele  you  serve; 
and  what  the  political  climate  of  your  governing  body  is.  In  short, 
it  depends. 

To  summarize,  have  fun  and  dream  the  most  impossible  dreams — 
then  be  realistic  about  what  is  possible  now.  Don't  throw  out  those 
blue  sky  specifications.  Use  them  to  push  the  vendors  to  develop  and 
deliver  better  products.  To  borrow  a  phrase  from  test  pilots:  "Push  the 
envelope."  Test  pilots  push  a  new  airplane  to  the  very  limits  of  its 
power  and  endurance.  It  is  the  only  way  to  prove  and  improve  airplanes. 
New  technology  is  developed  to  correct  the  errors.  Similarly,  consumers 
and  users  must  push  systems  and  their  vendors  to  their  ultimate  limit 
and  set  new  performance  standards  and,  when  these  are  largely  met, 
push  the  vendors  again.  This  means  that  some  of  you  will  need  to 
serve  as  test  sites  for  new  systems.  It  can  be  dangerous  to  be  a  pioneer, 
but  no  progress  is  made  without  risks.  Some  of  the  new  systems  will 
be  successful.  If  you  are  the  test  site  for  such  a  system,  you  will  become 
a  hero  because  of  your  farsightedness.  If  the  system  fails,  you  may  have 
to  pay  a  price — anything  from  censure  to  loss  of  employment.  The  only 
consolation  is  that  you  will  have  advanced  the  state  of  the  art.  If  you 
keep  "pushing  the  envelope"  in  system  development,  in  five  to  seven 
years — when  you  are  ready  to  upgrade  or  replace  your  now  brand  new 
system — you  will  find  many  of  those  old  specifications  are  standard 
equipment. 

Know  what  you  want  to  accomplish,  then  look  for  the  means  to 
achieve  it.  This  is  the  way  to  encourage  development  and  help  your 
"blue  sky"  system  become  a  reality. 


THOMAS  R.  KOCHTANEK 

Chair 

Department  of  Information  Science 

School  of  Library  and  Informational  Science 

University  of  Missouri-Columbia 


Micro  Generations:  Current  and  Future  Directions 


INTRODUCTION 

A  digital  computer  is  a  programmable  device  which  on  the  broadest 
level  supports  the  manipulation  of  symbols  aggregated  as  data.  Simply 
put,  the  computer  is  a  tool  for  creating,  maintaining,  organizing,  storing, 
transmitting  and  disseminating  data  of  all  types.  Developments  in 
computing  hardware  have  a  certain  historical  significance  and  offer 
a  clear  portrait  of  the  role  of  technology  in  society.  Newer  technology 
has  its  roots  in  this  compact  history. 

The  first  section  of  this  paper  traces  the  history  of  computer 
hardware  in  general.  The  second  section  focuses  on  the  evolution  of 
microcomputers  as  a  subset  of  general  computing  systems.  The  third 
section  focuses  on  progress  in  the  area  of  32-bit  microcomputer 
architecture.  The  final  section  ties  those  advancements  in  microcom- 
puting to  existing  and  proposed  database  applications  in  libraries  and 
related  information  agencies. 


GENERAL  HISTORY  OF  COMPUTER  HARDWARE 

A  complete  depiction  of  the  evolution  of  computers  would  include 
details  regarding  the  simultaneous  evolutions  of  hardware,  operating 
systems  and  applications  software.  Hardware  development  provides  the 
raw  resources  for  computing,  operating  systems  provide  real  time  access 
to  these  resources,  and  applications  software  guides  specific  procedures 
to  be  carried  out  by  the  computing  system.  The  objective  of  this  section 
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is  to  focus  primarily  on  hardware  generations  of  computers,  but 
occasional  references  will  be  made  to  developments  in  the  related  areas 
of  software  design. 

The  first  computers  were  built  exclusively  as  prototypes  and  were 
used  primarily  to  perform  highly  accurate  arithmetic  computations  for 
military  and  related  research  needs.  Most  of  the  earliest  computers  were 
invented  at  universities  and  were  supported  by  contracts  from  the  War 
Department,  now  the  Department  of  Defense.  As  the  benefits  of  these 
early  computational  devices  began  to  be  noticed  by  engineers  and 
scientists  alike,  additional  applications  arose  which  transcended  numeric 
processing.  The  differences  in  design  and  function  are  due  primarily 
to  the  manner  in  which  each  applications  program  views  the  stream 
of  data  input  to  the  computer.  In  some  cases  the  data  stream  is  treated 
as  text,  sometimes  as  formulas,  and  other  times  as  data  from  various 
fields  in  a  record. 

The  first  revolution  in  the  design  of  computing  systems  began  in 
1943  and  lasted  until  1950.  Although  there  is  some  dispute  as  to  who 
can  claim  to  have  built  the  first  mainframe  or  maxicomputer,  this  event 
is  primarily  of  historical  concern.  Of  more  enduring  impact  was  the 
introduction  of  the  first  mass-produced  programmable  computer  in  1951. 
The  UNIVAC  I  was  so  important  to  the  fledgling  computing  industry 
that  historians  refer  to  it  as  "the  beginning  of  the  first  generation  of 
computing."  Two  short  years  later,  IBM  began  mass  marketing  a 
mainframe  computer,  the  IBM  650,  which  utilized  punched  cards  in 
a  computer  for  the  first  time.  IBM  quickly  dominated  the  market  with 
its  sales  of  computing  systems. 

These  early  computing  systems  were  unique  with  respect  to  their 
applications  and  users.  Initial  computers  were  programmed  in  machine 
language  only  (using  base  two  zeroes  and  ones)  to  execute  one  request 
at  a  time,  calculating  and  outputting  the  results  for  a  single  end  user. 
There  was  no  internal  or  core  memory,  no  keyboards  or  terminals  and 
no  storage  devices  as  we  now  know  them.  Central  processors  were 
initially  composed  of  large  numbers  of  vacuum  tubes,  although  the 
transistor  had  been  invented  in  1948.  By  the  middle  1950s,  programmers 
were  using  assemblers  in  the  place  of  machine  language  programming 
to  develop  applications  programs.  It  was  not  until  1958  that  FORTRAN, 
ALGOL  and  a  language  called  APL  were  introduced  and  used  as  high 
level  programming  languages. 

The  second  generation  of  computers  (1959-1963)  was  typified  by 
IBM's  1401  mainframe,  developed  in  1959  and  distributed  in  1960.  This 
mainframe  benefitted  from  the  introduction  of  transistors,  which 
replaced  the  bulky  and  problem-oriented  vacuum  tubes  of  its 
predecessors.  This  system  also  utilized  internal  memory,  supporting 
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between  IK  and  16K  RAM.  Along  with  significant  improvements  to 
existing  high  level  languages,  new  languages  were  developed:  COBOL, 
LISP  and  SNOBOL,  to  name  a  few. 

The  third  generation  of  computers  (1964-1967)  was  more  revolu- 
tionary in  its  rapid  development  of  numerous  new  architectures, 
languages  and  capabilities.  An  important  new  language,  BASIC,  was 
introduced  in  1964.  A  new  architectural  design  resulted  in  the 
introduction  of  the  first  minicomputer,  the  PDP-8,  from  Digital 
Equipment  Corporation  in  1965.  An  even  more  significant  introduction 
was  the  new  line  of  IBM  360s  introduced  in  1966.  Along  with  the  360 
came  a  promising  new  programming  language  from  IBM,  PL/I.  This 
period  witnessed  the  early  developments  in  time-sharing,  whereby 
certain  resources  of  the  computer,  particularly  main  memory  and 
external  storage,  were  optimally  allocated  among  several  simultaneous 
users. 

The  fourth  generation  of  computers,  from  1968-1974,  was  marked 
by  a  steady  but  somewhat  less  spectacular  growth,  especially  when 
compared  to  the  remarkable  growth  of  the  previous  period.  The  major 
introduction  was  in  the  form  of  a  new  IBM  architecture,  the  IBM  model 
370.  But  the  post- Vietnam  recession  had  its  effects  on  the  computing 
industry.  In  1969  numerous  computer  firms  laid  off  significant  amounts 
of  their  work  force.  Several  corporations  either  folded  or  sold  themselves 
to  other  companies.  The  majority  of  changes  were  silent  ones,  such 
as  the  growth  in  sales  of  minicomputers  and  companies  such  as  Digital 
Equipment  Corporation,  which  sold  minicomputers  and  accompanying 
services  and  products.  Schools  and  colleges  began  to  purchase 
minicomputers  for  administrative  uses  and  to  experiment  with  their 
use  in  classroom  instruction.  Cathode  ray  tubes  became  affordable  and 
supported  enhanced  access  to  these  systems  via  screens  and  keyboards. 
The  use  of  semiconductors  for  internal  memory  became  standard.  The 
distribution  of  smaller,  less  expensive  minicomputers  began  a  silent 
revolution  which  was  soon  to  be  fueled  by  the  introduction  of  the 
affordable  microcomputer  to  the  masses. 


A  MORE  DETAILED  HISTORY  OF  MICROCOMPUTERS 

In  1971  a  company  named  Intel  began  shipping  the  first 
microprocessor,  the  4-bit  4004,  a  complete  central  processor  on  a  single 
silicon  chip.  In  1973  an  improved  version  of  that  first  chip,  the  8080, 
was  shipped  by  Intel.  In  1975  another  company,  MITS,  manufactured 
and  offered  the  first  popular  mail  order  microcomputer  kit  for  $395. 
The  Altair  was  based  on  the  8080  central  processing  unit,  had  256  bytes 
of  RAM,  no  ROM,  no  CRT,  no  keyboard,  no  printer,  and  no  external 
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disk  storage.  The  Altair  incorporated  twenty-five  switches  for  input  and 
thirty-six  blinking  lights  as  output  in  support  of  the  8080  machine 
language.  This  offering  compared  to  those  mainframes  of  thirty  years 
prior,  in  that  the  computing  system  was  limited  in  its  communication 
capabilities  and  only  supported  a  single  user  possessing  a  high  level 
of  computing  expertise.  The  main  difference  was  the  fact  that  this 
microcomputer  was  affordable  and  was  designed  with  a  more  "open" 
architecture,  allowing  the  addition  of  specific  peripherals  to  the  base 
system.  Eventually,  other  companies  built  add-on  boards,  disk 
controllers,  keyboard  interfaces  and  the  like.  Bill  Gates  invented  a  way 
to  make  this  personal  computer  kit  handle  BASIC.  Gary  Kildall  wrote 
a  single  user  disk-based  operating  system  for  the  8080  called  CP/M. 
Although  the  Altair  never  made  it  as  a  full  production  offering  (MITS 
went  bankrupt)  the  impact  of  this  new  microcomputer  system  was 
resounding.  For  a  few  thousand  dollars,  one  could  have  true  personal 
computing  at  one's  fingertips! 

By  1977  there  were  four  companies  offering  microcomputers  with 
built-in  keyboards:  Radio  Shack  (TRS-80  @  $499),  Commodore  (Pet 
@  $795),  Apple  (Apple  II  @  $970)  and  Processor  Technology  (Sol  20 
@  $1850).  Commodore's  and  Radio  Shack's  offerings  were  considered 
the  bargains  since  they  included  a  monitor  with  the  unit.  In  1979  Texas 
Instruments  and  Atari  entered  the  fray.  Then  in  1980  and  1981,  the  Timex 
Sinclair  and  the  portable  Osborne  were  introduced. 

On  August  12,  1981  the  IBM  PC  was  announced.  To  many,  this 
introduction  signalled  legitimacy  for  the  fledgling  microcomputer 
industry.  Business  began  an  unprecedented  mass  purchase  of  millions 
of  personal  microcomputers  offered  by  IBM  and  its  competitors.  End 
users  of  computing  systems  proliferated  as  these  industry  standard 
microcomputers  took  their  place  on  the  desk  tops  of  corporate  America. 
A  new  development  in  computing,  mass  access  to  personal  workstations, 
invited  literacy,  efficiency  and  productivity  to  individuals  across  many 
walks  of  life.  While  large  centralized  systems  operated  primarily  by 
data  processing  departments  still  dominated  corporate  operations, 
millions  of  end  users  experienced  for  the  first  time  personalized 
computing,  an  experience  which  has  made  its  mark  in  computing 
systems  as  we  now  know  them. 

Another  significant  development  in  industry  standard  microcom- 
puters began  with  the  1984  introduction  of  the  Apple  Macintosh,  a 
derivative  of  an  earlier  Apple  product  called  Lisa.  The  first  Macintosh 
used  a  different  microprocessor,  the  Motorola  68000,  and  incorporated 
a  graphics-based  operating  system  capable  of  supporting  easy-to-use 
applications  software.  Most  of  the  ease  was  for  the  end  user  who 
navigated  pull  down  windows  with  a  mouse,  selecting  among  various 
icons  to  interact  with  the  machine.  Developers  of  applications  software 
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have  found  the  Mac  to  be  somewhat  slow  and  cumbersome  in  developing 
business  and  related  software.  One  major  development,  known  as 
HyperCard,  offers  promise  for  future  developments  of  this  offering  from 
Apple. 

IBM  introduced  a  new  line  of  personal  computer  systems  on  April 
2,  1987  known  collectively  as  the  PS/2  line.  While  the  "low  end"  of 
the  line  (models  25  and  30)  offered  minor  improvements  such  as  swift 
processing  speeds  and  smaller  desktop  "footprints,"  the  upper  end  of 
the  line  (models  50  through  80)  offered  extended  memories,  even  faster 
processing  speeds  and  a  new  bus  architecture  called  "micro  channel" 
or  MCA  for  short.  This  architecture  holds  the  promise  for  more 
sophisticated  multi-tasking  applications  development,  an  important 
consideration  in  the  design  of  many  current  database  management 
applications  running  on  microcomputer  systems. 

MS-DOS  Based  Personal  Computers:  A  Review 

A  review  of  events  limited  to  the  IBM  line  of  PC-DOS  products 
may  prove  helpful  to  chronicle  one  segment  of  the  personal 
microcomputer  arena.  As  mentioned  above,  the  original  IBM  PC  was 
announced  to  the  public  on  August  2,  1981.  The  next  developments 
included  increased  internal  (RAM)  memories  and  fixed  disk  storage 
capabilities,  released  together  as  the  IBM  PC/XT.  In  1984  IBM 
announced  its  IBM  PC/AT  (for  Advanced  Technology)  based  on  the 
Intel  80286  microprocessor.  Three  years  later  the  AT  outsold  the  number 
of  systems  produced  as  the  original  PC  and  PC/XT.  Eighteen  months 
after  the  PS/2  line  was  introduced,  the  first  one  million  model  30s 
were  sold.  Other  models  of  the  PS/2  line  are  now  being  purchased 
in  greater  quantities,  some  as  high  performance  personal  workstations, 
others  as  file  servers  within  local  area  networks.  If  a  decision-maker 
were  to  limit  one's  selection  solely  to  MS-DOS  based  personal  computers 
from  1981  to  present,  Table  1  might  reflect  personal  purchases  made 
from  year  to  year. 

These  selections  are  based  on  those  assumptions  of  financial 
restrictions  which  might  apply  to  personal  situations.  Certainly,  more 
powerful  systems  can  be  configured  if  cost  is  no  object.  The  attempt 
is  to  portray  mass  selections,  not  optimal  designs.  The  appearance  of 
non-IBM  equipment  in  the  list  is  indicative  of  a  trend  in  the  industry 
to  offer  lower  cost,  more  powerful  clone  systems  as  competition  to  one 
industry  giant,  the  IBM  standard. 

The  major  trend  is  clearly  in  favor  of  the  80386-  and  80486-based 
microprocessor  as  a  machine  which  poises  the  end  user  for  developments 
in  both  operating  systems  and  applications  programs.  Worldwide 
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estimates  of  sales  of  386-based  systems  were  projected  to  be  4.4  million 
systems  in  1989  alone.  This  most  significant  trend  deserves  further 
explanation. 

TABLE  1 
MS-DOS  BASED  PERSONAL  COMPUTERS,  1981-PRESENT 

Date  Product  Description 

Summer  1981  IBM  PC  w/64K  RAM  1  floppy 

Summer  1982  IBM  PC  w/256K  RAM  2  floppies 

Summer  1983  IBM  PC  XT  W/640K  RAM  and  10  Mbyte  hard  drive 

Summer  1984  IBM  PC  AT  W/640K  RAM  w/slow  30  Mbyte  hard  drive,  1.2 

Mbyte  floppy 
Summer  1985  IBM  PC  AT  w/speedup  crystal  and  large,  quick,  reliable  hard 

drive 
Summer  1986  IBM   PC  clone   80286 — much   progress  was   made  by   the 

competition 

Summer  1987  IBM  PS/2  model  30  or  50 

Summer  1988  IBM  PS/2  model  50  zero  wait  state  or  any  competitive  80286 

clone 
Summer  1989  Clone  80386  with  large,  quick  drive  (Dell  386  and  Everex  Step 

386  were  big  sellers) 
Summer  1990  Faster  80386  based  systems  addressing  increased  RAM  and 

sophisticated   peripherals   in   support   of   multi-tasking   or 

"windowed"  applications 
Summer  1991  80386  or  possibly  an  80486  system  addressing  even  larger,  faster 

storage  devices;  possibly  an  external  CD-ROM  drive. 
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A  Review  of  32-Bit  Technologies 

The  development  of  the  8-bit  microprocessor  and  accompanying 
peripherals  was  but  an  initial  seed  in  the  harvest  of  microcomputer 
products.  Soon,  demanding  users  moved  up  to  16-bit  technologies  based 
on  two  microprocessors:  Intel's  8086  and  Motorola's  68000.  These  and 
their  powerful  descendants  (the  80286  and  68010)  have  made  their  mark 
while  dominating  the  microcomputer  industry  over  the  past  several 
years.  In  the  quest  for  even  more  processing  capabilities,  the  32-bit 
processor  platform  is  emerging  as  a  new  force  in  the  industry.  Although 
these  processors  have  been  available  for  some  time,  high  demand  and 
mass  production  have  continued  to  lower  costs  well  inside  of  the  $10,000 
mark  for  a  configured  system.  That  amount  is  often  considered  the 
high  water  mark  of  personal  computing  cost.  Business  applications 
involve  different  cost  considerations. 
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With  these  improved  technologies  comes  a  whole  flood  of 
terminology  new  to  the  microcomputer  user:  memory  caching,  virtual 
memory,  pipelining,  RISC,  CRISP,  MIPs,  and  so  on.  These  not-so-new 
concepts  (many  were  originated  in  the  mainframe  world)  can  be 
examined  in  light  of  additional  facts  regarding  microcomputer-based 
products. 

What  Is  It  about  Word  Length  that  Is  So  Important? 

Word  length  is  a  term  used  to  designate  the  number  of  binary  digits 
that  can  be  processed  at  one  time  inside  the  computer.  Just  as  an 
automobile  with  eight  cylinders  can  deliver  more  useable  horsepower 
than  one  with  four  cylinders,  so  a  central  processing  unit  with  16  or 
32  bits  can  provide  more  raw  computing  power  than  one  with  8  bits. 
The  wider  the  bit  path,  the  more  work  a  CPU  can  do  at  one  time. 
Some  liken  word  length  to  a  highway  of  signals,  where  a  highway  of 
four  lanes  can  provide  higher  transportation  rates  than  one  with  just 
two  lanes.  In  addition  to  these  increased  processing  capabilities,  CPUs 
with  larger  word  lengths  also  possess  such  features  as  higher  clock  rates, 
larger  internal  registers  and  increased  addressable  memory. 

What  Applications  Need  the  Kinds  of  Power  Reserved  for  32-bit  Chips? 
Faster,  more  powerful  processors  are  needed  not  so  much  for 
applications  such  as  recalculating  a  spreadsheet,  but  for  those  types 
of  applications  that  require  high  resolution  graphical  interfaces  and 
large  amounts  of  high  speed  memory.  Databases  are  a  prime  example 
of  these  kinds  of  needs,  since  they  occupy  large  portions  of  fixed  and 
volatile  memories.  Many  applications  can  benefit  from  improvements 
in  user  interfaces  supporting  graphical  interfaces.  As  operating  systems 
continue  to  develop,  they  appear  to  be  following  the  trends  set  by  the 
Macintosh,  incorporating  pull-down  windows,  overlayed  graphic 
windows  and  higher  resolution  screens.  These  features  all  require 
extremely  fast  and  powerful  processors  such  as  those  32-bit  CPUs 
currently  under  refinement. 

How  Does  One  Measure  Internal  Clock  Speeds  and  Cycle  Times?  What 
Are  Wait  States? 

As  soon  as  a  CPU  receives  a  set  of  data  or  instructions,  questions 
of  timing  arise.  How  long  does  the  CPU  store  that  data?  When  does 
the  CPU  refresh  dynamic  RAM?  When  does  the  CPU  move  it?  How 
are  signals  synchronized?  These  issues  are  so  critical  that  logic  with 
memory  is  called  sequential,  as  opposed  to  combinatorial  logic  of 
memory-less  computers.  Sequential  logic  is  kept  synchronized  with  an 
internal  clock. 
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All  computers  have  internal  clocks.  The  clock's  pulse  is  the 
computer's  heartbeat.  One  clock  pulse  is  the  burst  of  current  when  clock 
output  =  1.  One  cycle,  also  known  as  one  Hertz,  is  the  interval  from 
the  beginning  of  a  pulse  to  the  beginning  of  the  next.  Depending  on 
the  computer,  the  clock  frequency  may  be  hundreds  or  thousands  or 
millions  of  cycles  per  second.  Megahertz,  or  Mhz,  is  a  measure  indicating 
the  number  of  millions  of  cycles  of  a  CPU  per  second.  Mhz  is  one 
measure  of  a  CPU's  capability  to  perform. 

The  idea  of  using  a  clock  is  that  the  computer's  logical  state  should 
change  only  on  the  clock  pulse.  Ideally,  when  the  clock  strikes  one, 
all  signals  move,  then  stop  on  clock  =  0.  A  condition  known  as  zero 
wait  states  means  execution  occurs  at  the  conclusion  of  a  single  cycle. 
One  or  two  wait  states  implies  pauses  in  the  transfer  of  binary  data 
to  and  from  registers  within  the  CPU. 

Certain  operations  within  the  CPU  take  more  than  a  single 
instruction  to  perform.  Some  mathematical  operations  normally  take 
many  different  instructions  to  execute  on  a  16-bit  CPU.  A  computer 
with  a  32-bit  word  length  may  process  an  operation  in  one  or  two 
instructions,  thereby  increasing  throughput  efficiency.  Megahertz  alone 
is  not  a  perfect  measure  of  the  raw  computing  speed  of  a  central 
processing  unit.  MIPS  is  often  used  to  designate  millions  of  instructions 
per  second,  as  opposed  to  millions  of  cycles  per  second  (Mhz).  This 
measure  is  dependent  on  the  type  of  instruction  under  consideration. 
For  example,  certain  often-used  instructions  execute  in  a  single  cycle 
of  the  CPU  while  others  require  hundreds  of  cycles  to  execute.  MIPS 
is  calculated  by  determining  the  average  number  of  clock  cycles  a  chip's 
machine  level  instructions  take  to  execute  and  dividing  the  CPU  clock 
speed,  measured  in  Mhz,  by  that  number.  If  a  CPU  can  perform  each 
of  its  binary  instructions  in  one  cycle  of  the  clock  and  that  chip  has 
a  clock  speed  of  10  Mhz,  then  it  would  process  10  million  instructions 
per  second,  or  10  MIPS. 

What  follows  in  Table  2  are  some  of  the  processing  speeds  of  Intel's 
8,  16  and  32  bit  central  processors.  Each  is  measured  in  millions  of 
cycles  per  second  or  Megahertz  (Mhz)  and  in  millions  of  instructions 
per  second  (MIPS). 

In  comparison,  the  Motorola  68020  CPU  is  rated  at  4.0  MIPS  and 
the  68030  at  6.8  MIPS.  A  change  in  clock  speeds,  instruction  sets  or 
wait  states  can  render  these  comparisons  useless  for  any  specific  case. 
For  example,  the  recently  announced  NeXT  computer  uses  the  Motorola 
DSP56001  CMOS  chip  operating  at  20  Mhz.  However,  this  chip's 
instructions  execute  (on  the  average)  every  two  clock  cycles  to  give  the 
CPU  a  10  MIPS  rating. 

Comparing  these  ratings  to  other,  more  powerful  computers  is 
interesting  as  well.  For  example,  an  early  DEC  VAX  11/780  minicom- 
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puter  processes  at  approximately  1  MIPS.  Powerful  workstations  such 
as  Sun,  Apollo  and  the  DEC  MicroVAX  house  CPUs  ranging  from  5 
to  15  MIPS.  An  IBM  3090  mainframe  operates  at  100  MIPS.  This  indicates 
that  the  current  generation  of  microprocessors  are  capable  of  providing 
up  to  five  times  as  much  computing  power  as  the  previous  generations 
of  minicomputers.  Parallel  processing  supports  the  linking  of  multiple 

TABLE  2 
PROCESSING  SPEEDS  OF  INTEL  CENTRAL  PROCESSORS 


CPU 

Mhz                         : 

MIPS 

8088 

4.77  Mhz                    : 

0.5  MIPS 

8086 

8-10  Mhz                      : 

0.5-1.0  MIPS 

80286 

10-20  Mhz                   : 

0.5-3.0  MIPS 

80386 

20-35  Mhz                  : 

4.0-8.0  MIPS 

80486 

50  Mhz                         : 

speculative 

CPUs  within  a  single  system  unit.  Imagine  linking  ten  10  MIPS  CPUs 
together  in  a  single  microcomputer  workstation,  one  that  is  capable 
of  providing  as  much  computing  power  as  a  current  generation  IBM 
mainframe! 

How  Can  A  CPU  Be  Sped  Up?  What  Is  RISC  and  Clock  Speed? 

There  are  two  ways  to  increase  any  particular  chip's  processing 
speed.  One  is  to  step  up  the  clock  speed.  New  improved  versions  of 
existing  CPUs  often  offer  increases  in  clock  speed.  For  example,  the 
80286  as  it  was  first  introduced  in  1984  had  a  clock  speed  of  8  Mhz. 
In  subsequent  updates,  that  same  processor  was  bumped  to  10  and  then 
12  Mhz.  Another  method  is  to  decrease  the  number  of  cycles  needed 
to  execute  instructions.  Two  distinct  approaches  have  been  used  to 
accomplish  this  decrease: 

1.  Some  of  the  more  predominant  CPUs,  such  as  the  80386  from  Intel 
and   the   Motorola   68030,    support   an   approach   called   Complex 
Instruction  Set  Computing  (CISC).  This  design  uses  an  on-chip 
microcode  program  (software)   to  pre-process  certain  instructions 
before  actual  CPU  execution  occurs.  This  type  of  design  reduces  each 
instruction  to  2  to  6  CPU  cycles,  resulting  in  a  full  set  of  instructions 
executing  at  higher  speeds  than  previous  conventional  designs. 

2.  Certain   chips   use   a   modified   and   highly   optimized   internal 
instruction  set  called  Reduced  Instruction  Set  Computing  (RISC). 
Several  32-bit  chips  under  development  use  RISC  technology  to 
approach  the  theoretical  limits  of  processing:  single  cycle  instruction 
processing  efficiency,  where  each  instruction  is  processed  within  a 
single  cycle  of  the  CPU.  In  RISC  architectures,  the  CPU's  machine 
language  instruction  set  is  pared  down  to  a  subset  of  fundamental 
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and  frequently  used  instructions.  The  instructions  themselves  are 
optimized  to  execute  directly  in  hardware,  without  the  need  of 
microcode.  The  results  are  extraordinary:  RISC  processors  generally 
execute  each  basic  instruction  in  1.25  to  2  cycles.  A  RISC  chip  can 
operate  at  up  to  three  times  faster  than  its  non-RISC  counterpart. 
These  chips  are  being  developed  primarily  by  high  end  workstation 
companies  such  as  Sun  and  Hewlett  Packard.  IBM  has  offered  a  RISC 
workstation  for  the  past  few  years.  Apple  computer  is  rumored  to 
have  been  developing  a  RISC  chip  for  its  high  end  Macintosh  line. 
Motorola  has  introduced  an  entire  line  of  RISC-based  CPUs  in  the 
88000  family.  Intel  is  also  developing  a  RISC  chip  called  the  80960. 
Experts  envision  the  development  of  CRISP,  for  Complexity  Reduced 
Instruction  Set,  a  combination  of  optimized  CISC  chips  using  RISC 
technology.  Recently  released  chips  such  as  Intel's  80486  and 
Motorola's  68040  are  likely  to  lean  in  this  direction. 

What  Are  Some  Roadblocks  to  Performance? 

The  performance  enhancements  of  microprocessors  are  truly 
remarkable  feats  when  considered  independent  of  other  potential 
bottlenecks  in  a  computing  system.  But  when  any  CPU  is  sped  up  beyond 
20  Mhz,  the  main  impediment  to  performance  becomes  RAM  memory, 
specifically,  dynamic  RAM.  While  DRAM  is  an  excellent  bargain  in 
terms  of  price  per  megabyte,  the  fastest  DRAM  chips  available  cannot 
keep  up  with  the  relentless  increases  in  CPU  clock  speeds.  One  solution 
to  the  problem  would  be  to  replace  all  DRAM  chips  with  their  speedy 
counterparts,  static  RAM,  or  SRAM,  but  the  cost  for  several  megabytes 
of  SRAM  would  be  prohibitive. 

Instead,  chip  makers  are  focusing  on  developments  such  as  Single 
In-line  Memory  Modules  (or  SIMM)  used  to  store  up  to  4  megabytes 
of  inexpensive  DRAM  on  a  system  board.  This  is  counterbalanced  with 
a  small  (between  4K  and  256  Kbytes)  amount  of  very  high  speed  static 
RAM  installed  as  a  buffer  or  "cache"  used  to  feed  the  DRAM  and  to 
help  that  inexpensive  mass  memory  keep  pace. 

Cache  memory  is  a  small  but  high  speed  holding  area  for  data 
that  a  CPU  is  using  or  about  to  use.  Consider  the  situation  where  one 
is  attempting  to  prepare  a  meal  in  one's  own  home.  Perhaps  a  key 
ingredient  is  missing  from  the  cupboard  and  a  trip  to  the  grocery  is 
required.  At  the  grocery,  one  has  the  ability  to  purchase  not  only  the 
specific  items  necessary  for  the  preparation  of  that  particular  meal, 
but  also  additional  items  which  may  be  needed  for  other  preparations. 
So  it  is  with  cache  memory.  High-speed  static  RAM  is  utilized  to  store 
anticipated  data  which  the  CPU  is  likely  to  require  in  ensuing 
processing. 
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Many  of  the  current  generation  386  PCs  use  this  technique  to 
improve  database  access  and  performance.  Since  database  queries  are 
disk-intensive  activities,  cache  memory  can  be  used  to  temporarily  store 
frequently  accessed  data  likely  to  be  requested  by  the  CPU  for  processing. 
Intel  makes  a  82385  controller  which  uses  32K  to  256K  of  35  nanosecond 
static  RAM  for  use  as  cache  memory.  The  80386  CPU  in  combination 
with  this  controller  can  locate  data  in  the  cache  with  95  percent  "hit 
rate." 

The  IBM  PS/2  model  70-121  does  not  support  cache  memory.  The 
model  70-121  took  sixteen  seconds  to  query  a  1000  record  Paradox  386 
database.  The  Everex  Step  386/20  uses  cache  memory  and  the  82385 
controller  mentioned  above.  The  elapsed  search  time  for  the  Paradox 
search  was  eight  seconds,  or  half  the  time  of  the  uncached  system.  A 
Dell  System  310  using  cache  memory  recorded  the  same  eight-second 
response  time.  All  hard  drives  for  the  three  systems  had  the  same  twenty- 
five  millisecond  seek  rating. 

Motorola  uses  a  slightly  different  approach  to  cache  memory.  Instead 
of  using  separate  SRAM  chips  and  a  controller,  the  68020  and  68030 
CPUs  use  a  256-byte  instruction  cache  built  into  the  CPU.  By  caching 
both  the  instruction  and  the  data  and  eliminating  the  external  controller 
and  cache  chips,  these  CPUs  display  even  higher  clock  speeds. 

Another  technique  is  known  as  pipelining.  It  is  a  known  fact  that 
a  CPU  is  idle  during  certain  processes  it  must  perform.  Basically,  a 
CPU  is  very  routine  in  its  procedures.  It  first  reads  an  instruction  from 
memory  and  decodes  that  instruction.  Then  the  CPU  reads  data  from 
memory  and  processes  that  data  in  accord  with  the  instructions,  writing 
the  results  to  memory.  This  cycle  continues  until  all  instructions  are 
processed  or  the  process  aborts.  During  this  prescribed  cycle,  the  bus 
between  memory  and  the  CPU  sits  idle,  waiting  for  the  CPU  to  access 
additional  instructions  or  data.  During  an  idle  moment,  the  CPU  can 
be  instructed  to  peek  at  the  next  instruction  or  chunk  of  data,  parking 
its  location  or  contents  in  a  special  address  within  its  register.  This 
"look  ahead"  technique,  or  pipelining,  increases  the  throughput 
performance  of  the  chip.  Both  Motorola  and  Intel  have  introduced 
pipelining  in  their  current  68030  and  80386  CPUs.  Certain  RISC  chips 
can  be  doing  up  to  five  tasks  at  once,  thereby  increasing  the  efficiency 
of  the  chip. 

Addressable  Internal  Memory:  What  Good  is  All  That  RAM? 

Earlier  CPUs  had  severe  RAM  memory  limitations.  Eight-bit  CPU 
architectures  were  limited  to  64K  RAM  and  16-bit  CPUs  had  1M  memory 
limitations.  Addressable  or  user  memory  of  any  particular  system 
offering  varied  depending  on  operating  system  characteristics  and 
address  paths  of  the  internal  bus  architecture.  For  example,  the  original 
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IBM  PC,  with  its  twenty  address  lines  on  its  systems  bus,  could 
theoretically  address  2  to  the  20th  power  or  one  megabyte  of  main 
memory.  Because  of  the  storage  requirements  of  the  operating  system 
(and  other  features),  the  actual  processing  capacity  of  the  original  IBM 
PC  is  reduced  to  640K  of  useable  RAM  for  the  end  user. 

Why  would  a  user  need  any  more  than  640K  RAM?  There  are  several 
reasons.  First  is  the  fact  that  complex  applications  programs  can  exceed 
such  a  limit.  As  software  is  developed  and  enhanced,  its  code  can  readily 
exceed  that  barrier.  Another  reason  was  referred  to  earlier:  cache  memory 
can  take  up  additional  addressable  memory  of  the  CPU.  A  third  reason 
for  extending  internal  memories  is  due  to  the  rise  in  popularity  of  multi- 
tasking in  the  microcomputer  environment.  Operating  systems  such 
as  UNIX  and  now  OS/2  have  the  capability  to  run  multiple  tasks  or 
applications  in  memory  simultaneously.  This  load  requires  a  much 
larger  internal  capacity  to  store  and  run  these  multiple  applications. 
An  example  of  multi-tasking  would  be  the  simultaneous  loading  of 
a  word  processor,  a  spreadsheet  and  a  database  manager  into  memory. 
A  single  user  could  look  up  information  in  the  database,  calculate 
something  from  that  data  using  the  spreadsheet  program,  and  transfer 
the  result  for  inclusion  in  the  word  processor. 

There  are  two  primary  means  for  providing  large  amounts  of  RAM 
memory  in  addition  to  the  "base"  memory.  One  is  to  "extend"  RAM 
by  using  a  second  segment  of  RAM  chips  and  "bank  switching"  between 
base  and  extended  memory.  This  is  how  the  early  Apple  Us  could  address 
128K  RAM  when  their  architecture  permitted  only  64K  of  directly 
addressable  RAM.  Another  method  is  to  incorporate  "virtual"  memory 
features  similar  to  the  manner  in  which  mainframes  and  minicomputers 
have  done.  Virtual  memory  is  a  technique  that  allows  a  CPU  with 
a  small  amount  of  "real"  memory  to  act  as  if  it  has  even  more  than 
that  amount  of  real  memory.  A  special  chip  is  used  which  responds 
to  a  request  for  more  RAM  than  is  physically  present  by  generating 
an  "interrupt."  The  operating  system  is  then  asked  to  swap  certain 
contents  of  currently  unused  (but  currently  storing  data  or  instructions) 
RAM  to  physical  disk,  thereby  freeing  up  extra  RAM  for  the  requested 
instructions.  By  dedicating  a  segment  of  a  hard  disk  to  virtual  memory, 
large  RAM-intensive  applications  programs  can  be  run  on  computing 
systems  with  relatively  small  memories. 

Both  Intel  and  Motorola  incorporate  virtual  memory  options  into 
their  16-bit  CPUs:  the  80286  and  68020.  For  example,  Motorola's  68020 
and  68030  chips  can  access  a  full  4  megabytes  of  virtual  RAM,  even 
though  only  1  or  2  megabytes  are  present.  Intel's  80386  theoretically 
can  access  64,000  gigabytes  (64  terabytes)  of  virtual  memory.  The  main 
usage  of  virtual  memory  will  be  in  support  of  multi-tasking  processes 
for  single  and  multiple  users  of  such  systems.  Currently  operating  systems 
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such  MS-DOS,  PC-DOS  and  Apple/Finder  do  not  support  virtual 
memory.  Operating  systems  such  as  UNIX  and  refinements  such  as  the 
new  versions  of  OS/2  and  Apple's  System  8.0  are  in  support  of  virtual 
memory  on  microcomputers. 

What  Are  Some  Emerging  Hardware  Developments  that  Offer 
Promise  in  the  Evolution  of  Microcomputer  Technology? 

Great  progress  has  been  made  in  the  development  and  refinement 
of  existing  components  that  make  up  the  microcomputing  system. 
Specific  areas  include  new  chip  designs,  including  CPU  refinement  and 
RAM  developments;  marked  progress  in  storage  media  development; 
the  continued  refinement  of  "supermicros";  and  new  architectures  for 
future  hardware  platforms. 

Developments  in  CPU  refinement  and  RAM  improvements  have 
been  discussed  earlier.  One  of  the  more  explosive  growth  areas  across 
all  levels  of  computing  systems  involves  mass  storage  devices.  The  range 
of  devices  for  microcomputers  begins  with  floppy  disks  and  culminates 
in  such  mass  storage  devices  as  CD-ROM.  The  cost  per  unit  of  stored 
byte  has  been  reduced  drastically  since  mass  produced  microcomputers 
were  first  introduced.  The  optical  storage  technologies  associated  with 
CD-ROM  are  purely  microcomputer-based  and  CD-ROM  access  is  not 
associated  with  computing  systems  beyond  the  microcomputer.  While 
larger  scale  computing  systems  can  access  laser  disk  storage  devices, 
only  micros  have  been  used  to  control  access  to  the  more  popular  CD- 
ROM  products  and  devices. 

A  new  line  of  powerful  top  end  microcomputers,  referred  to  as 
LAN  (Local  Area  Network)  servers,  has  been  introduced  into  the 
marketplace.  These  systems  support  very  large  addressable  internal  and 
external  memories,  process  data  at  very  high  rates,  and  are  capable  of 
hosting  an  interconnection  of  micros  across  a  network  via  cabling  and 
data  exchange  protocols.  The  LAN  design  holds  great  potential  for 
the  refinement  of  small  scale  microcomputer-based  distributed 
applications. 

Another  major  development  is  in  the  area  of  multi-processor 
systems,  specifically  in  the  areas  of  parallel  processing.  While  personal 
computing  will  most  likely  continue  to  utilize  a  single  16-  or  32-bit 
processor,  perhaps  working  in  tandem  with  a  co-processor  for 
mathematical  computations,  higher  demands  for  computing  power  will 
likely  be  met  using  processors  linked  together  in  a  parallel  configuration. 
Each  CPU  is  dedicated  to  a  specific  task,  such  as  video  display,  general 
input/output  or  printer  output.  The  CPUs  share  a  centralized  internal 
memory.  Excellent  processing  benchmarks  are  associated  with  such 
designs. 
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Many  of  the  emerging  architectures  for  microcomputer  systems  units 
utilize  a  bus  standard  which  supports  such  multi-processor  designs. 
These  architectures  include  IBM's  MicroChannel  (MCA)  as  well  as  the 
microcomputer  manufacturer's  EISA  standard.  Some  envision  that  future 
computers  of  all  sizes  may  ultimately  be  composed  of  32-bit  CPUs 
operating  in  parallel.  To  increase  system  performance  in  one  area,  a 
single  CPU  is  added  with  specific  processing  domains.  If  this  becomes 
a  reality,  then  the  lines  distinguishing  micros  from  minis  from 
mainframes  will  become  increasingly  blurred.  For  example,  Intel  has 
reportedly  developed  a  prototype  system  which  supports  the  parallel 
connection  of  thirty-two  80386  CPUs,  yielding  the  kinds  of  performance 
associated  with  a  top-of-the-line  Cray  mainframe.  Parallel  computing 
systems  offer  the  potential  for  the  development  of  very  specialized, 
intelligent,  shared  database  applications. 

The  field  of  library  automation  has  at  least  one  vendor  currently 
offering  a  product  based  upon  parallel  computing.  One  CLSI  turnkey 
system  utilizes  a  Sequent  parallel  processor  which  supports  the 
installation  of  multiple  CPUs  to  accommodate  growth  as  it  relates  to 
demand  for  increased  processing  capabilities.  The  specifications  and 
benchmarks  for  this  system  indicate  a  marked  increase  in  performance 
over  more  conventional  single  processor  systems.  Yet  another  vendor, 
The  Library  Corporation,  states  the  following  in  its  brochure  for  a 
linked  circulation  control  module.  "Dual  386/20  computers  operating 
in  parallel  under  a  DOS  or  UNIX  environment  are  supported  by  a 
network  of  distributed  processors." 


CONCLUSION 


Link  to  Database  Management  Systems  Applications 

Database  applications  programs  were  first  introduced  when 
applications  developers  decided  to  treat  individual  blocks  of  data  as 
self-contained  units  and  further  divide  those  units  (called  records)  into 
named  and  addressable  fields.  In  this  manner,  many  similar  records 
pertaining  to  a  certain  application  (e.g.,  online  catalogs)  could  be  easily 
created,  stored,  edited  and  retrieved  for  various  display  or  print  purposes. 
Database  applications  which  support  those  functions  conducted  by 
information  professionals  tend  to  be  extremely  demanding  on  computer 
resources,  both  internal  and  external. 

Database  management  systems  allow  the  user  to  create,  edit,  store 
and  manipulate  data  of  various  forms  in  electronic  files,  much  as  one 
would  create  and  maintain  manual  files  on  any  given  subject  of  interest. 
The  major  difference  between  manual  and  electronic  data  files  lies  in 
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the  fact  that  electronic  files  are  much  more  readily  manipulated  and 
searched  than  their  manual  counterparts.  Combined  with  the  fact  that 
electronic  files  require  much  less  physical  storage  space  than  manual 
files,  one  can  begin  to  see  numerous  situational  advantages  of  these 
automated  database  applications. 

Many  of  the  hardware  developments  discussed  previously  have  had 
a  major  impact  on  the  design  of  resulting  microcomputer-based  database 
applications.  Chip  technology  is  setting  a  rapid  development  pace, 
improving  CPU  performance  and  at  the  same  time  providing  massive 
amounts  of  high  speed  internal  addressable  memory.  Database 
applications  tend  to  require  very  fast  processors  and  have  the  need  to 
address  large  internal  memories,  especially  those  applications  which 
involve  high  transaction  situations.  Library  applications  are  replete  with 
high  traffic  opportunities  such  as  online  public  access  catalogs, 
circulation  control  systems,  and  automated  reference  services.  In  addition 
to  requiring  speedy  resolution  of  events  and  procedures,  these 
applications  also  demand  very  large  storage  capacities. 

The  development  of  mass  density  devices,  such  as  high  speed 
magnetic  drives  and  high  density  optical  drives,  is  directly  in  support 
of  these  requirements.  Libraries  and  related  information  agencies  are 
using  microcomputers  with  high  density  magnetic  hard  drives  capable 
of  storing  up  to  314  megabytes.  Many  also  access  optical  storage  media 
capable  of  storing  over  600  megabytes  per  unit.  Multiple  configurations 
of  these  units  can  currently  provide  gigabytes  of  external  storage  capacity. 

Limitations 

It  seems  as  though  these  "chip  and  disk"  implementations  are  well 
ahead  of  developments  in  systems  and  applications  software.  This  is 
not  an  unusual  phenomenon.  But  what  are  some  limitations  of 
microcomputers  as  hardware  platforms  for  library-related  applications? 
While  the  mass  production  and  purchase  of  small  computing  systems 
certainly  bring  the  cost  per  unit  down,  these  personal  computing  systems 
do  suffer  from  reliability  and  durability  constraints.  Initial  systems  were 
developed  for  use  by  a  single  person  running  a  single  application  in 
RAM.  When  put  through  the  paces  of  multi-user  and  multi-tasking 
applications  which  dominate  the  library  marketplace,  such  systems 
perform  differently  than  in  the  personal  workplace.  Library  automation 
systems  are  required  to  run  night  and  day  in  faultless  fashion,  without 
skipping  a  beat.  They  may  be  called  upon  to  perform  literally  thousands 
of  transactions  in  a  very  short  period  of  time,  say  one  day.  The  mean 
time  between  faults  of  such  systems  must  be  very  long.  While 
microcomputers  are  fairly  simple  to  repair,  their  ability  to  perform  day 
in  and  day  out  is  suspect.  Most  personal-based  systems  were  not  designed 
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to  take  that  sort  of  computing  punishment.  Despite  all  the  efforts  and 
progress  made  in  developing  such  systems,  the  fact  remains  that 
microcomputers  may  not  be  the  best  performance  purchases  on  the 
market  today,  at  least  not  for  larger  library  automation  projects.  Many 
library  applications  are  multi-tasking,  multi-user  situations  that  require 
tremendous  processing  prowess,  more  than  most  current  personal 
workstations  have  to  offer. 

Nevertheless,  continued  refinement  of  32-bit  technologies  and 
maturity  in  terms  of  connectivity  issues  will  provide  increased 
alternatives  for  information  professionals  in  years  to  come.  As 
microcomputers  improve  their  track  record,  they  may  evolve  as  a  stable 
hardware  platform  for  various  database  storage  and  retrieval  applications 
such  as  those  associated  with  the  automation  of  library  processes. 
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Desktop  Research  and  Software  Connectivity 


ABSTRACT 

"Desktop  research"  encompasses  the  various  tools  that  a  scholar  requires 
in  the  course  of  his  or  her  work.  The  "scholar's  workstation"  of  the 
future  will  involve  several  software  packages  from  a  number  of  developers 
to  accomplish  the  tasks  required  in  doing  research  and  creating 
publications.  To  function  effectively,  the  programs  must  be  able  to 
interact  with  each  other  and  communicate  data.  A  common  user  interface 
will  ease  the  learning  of  each  new  addition  to  the  software  repertoire. 
A  model  workstation  is  discussed  that  allows  searching  of  bibliographic 
databases  or  library  catalogs,  the  assembly  of  bibliographies,  the  ordering 
and  acquisition  of  documents,  and  the  preparation  of  manuscripts.  (The 
workstations  to  support  the  concept  of  desktop  research  were  provided 
under  an  Apple  Library  of  Tomorrow  grant  from  Apple  Computer, 
Inc.) 


SOFTWARE  CONNECTIVITY 

A  great  deal  of  attention  has  been  paid  in  recent  years  to  the 
connections  between  computers  of  various  sorts.  This  connectivity 
includes  the  use  of  microcomputers  to  access  mainframe  or  supercom- 
puters as  well  as  the  connection  of  a  number  of  microcomputers  into 
a  local  area  network  or  workgroup  set  of  computers.  The  advantages 
of  connecting  computers  and  establishing  communication  between  them 
is  obvious.  But  the  connection  of  two  computers  whose  users  are  working 
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with  incompatible  software  is  like  two  speakers  in  a  telephone 
conversation  speaking  different  languages.  While  there  is  a  physical 
communication,  the  substance  is  entirely  lacking. 

Software  connectivity  is  the  ability  of  various  software  packages 
to  work  together  in  such  a  way  as  to  make  the  whole  greater  than 
the  sum  of  the  parts.  Although  software  vendors,  as  well  as  vendors 
of  information,  would  like  to  believe  that  a  user  will  use  only  their 
products,  it  is  becoming  increasingly  obvious  that  most  computer  users 
will  be  working  with  a  variety  of  software  packages,  often  to  accomplish 
a  single  task,  and  they  will  require  access  to  many  different  sources 
of  data.  One  answer  that  the  software  industry  tried  was  integrated 
software,  but  this  never  succeeded  to  any  great  extent.  No  one  software 
company  could  achieve  excellence  in  all  the  necessary  software  products, 
and  different  combinations  of  products  are  necessary  for  many 
applications.  Now,  more  and  more  vendors  are  designing  their  software 
with  the  necessary  capabilities  to  work  with  other  vendors'  software. 
The  result  is  the  ability  to  import  and  export  data  readily  between 
applications  and  thus  the  ability  to  process  data  through  several  products 
in  sequence.  There  are  even  some  more  or  less  standard  ways  to  handle 
certain  graphics  and  textual  data.  For  example,  any  spreadsheet  package 
can  read  or  write  Lotus  1-2-3  files.  This  means  that  any  spreadsheet 
program  can  communicate  with  any  other  spreadsheet  package  using 
the  Lotus  "standard"  as  an  intermediate  format. 

Even  more  important  is  the  realization  that  a  standard  user  interface 
for  a  large  number  of  programs  will  reduce  the  learning  necessary  to 
add  software  to  an  individual's  repertoire  and  make  the  transition  from 
one  software  product  to  another  easier.  As  encouraging  as  these 
developments  are,  there  are  also  counter- trends.  For  example,  Apple 
Computer,  long  the  leader  and  champion  of  the  standard  interface,  has 
introduced  HyperCard,  a  computer  programming  language  that  makes 
the  interface  the  subject  of  the  software  author's  whim.  Although  it 
is  possible  to  adhere  to  the  Apple  standard  using  HyperCard,  most 
HyperCard  authors  cannot  resist  the  temptation  to  create  an  innovative 
interface  and  perhaps  even  set  yet  another  new  standard.  Similarly,  in 
the  MS  DOS  and  UNIX  world,  there  still  has  not  been  any  substantial 
agreement  on  what  a  standard  interface  should  look  like.  A  number 
of  companies  are  attempting  to  create  the  standard  interface. 

This  fierce  battle  over  the  interface  is  an  indication  of  how  large 
the  stakes  for  the  winner  are.  Clearly,  computer  and  software 
manufacturers  realize  the  gains  to  the  developer  of  the  standard  interface 
are  enormous,  and  thus  each  manufacturer  is  struggling  to  establish 
its  own  interface  as  an  industry  standard.  Several  companies  are  even 
getting  together  to  develop  a  standard  while  others  are  filing  lawsuits 
against  each  other  over  ownership  of  the  interface.  The  stakes  have 
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to  be  high  to  spawn  that  much  cooperating  as  well  as  fighting  among 
competing  organizations.  (It  has  been  said  that  the  nice  thing  about 
standards  is  that  there  are  so  many  to  choose  from!) 

Even  without  the  Utopia  of  a  common  interface  and  clear  data 
format  standards,  it  is  still  possible  to  connect  several  software  packages 
together  and  even  transfer  data  between  different  computers  with 
different  operating  systems.  It  is  possible  to  exchange  word  processing 
documents  between  several  products  on  a  single  computer  and  to 
exchange  documents  between  major  word  processors  on  different 
machines.  For  example,  with  Microsoft  Word  or  WordPerfect,  it  is 
possible  to  transfer  a  document  between  an  IBM  PC  and  a  Macintosh. 
Pro-Cite  can  open  a  database  on  an  IBM  machine  from  the  Macintosh 
and  vice  versa.  Databases  can  be  transferred  back  and  forth.  Soon,  UNIX 
systems  will  have  the  same  capability  with  Pro-Cite.  This  interchange 
of  data  between  machines  and  software  packages  makes  it  possible  for 
people  with  different  machines  to  work  together  and  makes  the 
purchasing  choice  of  which  personal  computer  to  buy  a  bit  less 
harrowing. 

The  "Scholar's  Workstation" 

An  example  where  software  connectivity  can  play  a  major  role  is 
in  the  work  of  a  campus  researcher.  In  the  university  of  the  future, 
a  student  or  faculty  member  embarks  on  a  research  project  at  an  advanced 
workstation.  One  example  might  be  a  medical  student  working  on  a 
paper  dealing  with  a  new  drug  treatment  for  AIDS.  He  or  she  begins 
by  searching  three  sources  for  bibliographic  information. 

The  three  sources  to  be  searched  are  MEDLINE  on  a  CD-ROM 
player  next  to  the  personal  computer,  Biosis  and  Chemical  Abstracts 
databases  on  the  Dialog  Information  Services,  and  the  local  university 
library's  online  catalog.  From  these  three  sources,  the  student  will 
assemble  a  collection  of  references  on  the  topic.  Most  citations  will 
include  abstracts.  The  CD-ROM  is  searched  using  one  of  the  many 
providers  of  MEDLINE  on  CD-ROM.  The  Dialog  databases  and  the 
online  catalog  are  searched  using  Personal  Bibliographic  Software's  Pro- 
Search.  The  combined  records  are  downloaded  and  converted  to  a 
database  in  the  workstation  using  Pro-Cite  and  Biblio-Links.  Duplicate 
records  are  eliminated,  and  Pro-Cite  is  used  to  produce  a  bibliography 
for  the  paper  in  the  Council  of  Biology  Editors  format  required  by 
the  journal  to  which  the  paper  will  be  ultimately  submitted. 

From  the  Pro-Cite  database,  the  student  can  select  the  documents 
he  or  she  wants  to  examine.  This  selected  set  is  then  sent  by  modem 
to  a  workstation  in  the  library.  The  resulting  documents  are  physically 
taken  from  the  shelves  and  the  relevant  pages  scanned  into  the  library's 
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workstation.  These  images  can  then  be  sent  via  fax  modem  to  the 
student's  PC.  The  student  will  store  the  document  images  on  the  hard 
disk  of  the  workstation.  Using  optical  character  recognition  (OCR) 
software,  the  articles  will  be  converted  into  ASCII  characters.  The 
student  will  then  use  a  word-processor  to  begin  work  on  the  paper. 
Quotes  from  the  scanned  documents  will  be  inserted  into  the  document 
and  citations  of  the  form  "(Smith,  1980)"  will  be  appended  to  the 
quotations  used.  Charts  and  illustrations  will  also  be  cut  and  pasted 
into  the  paper  with  proper  attribution.  When  the  paper  is  nearing 
completion,  a  bibliography  will  be  generated  automatically  and 
appended  to  the  paper.  When  complete,  the  paper  will  be  sent  via  modem 
to  the  student's  professor.  The  paper  will  go  directly  to  the  professor's 
computer  where  it  can  be  examined.  It  will  also  be  printed  using  a 
laser  printer. 

This  scenario  suggests  how  the  student  or  faculty  member  of  the 
future  will  do  library  research  and  write  the  resulting  paper.  If  laboratory 
work  is  a  part  of  the  research,  the  results  of  the  experiment  can  be 
manipulated  by  computer  and  ultimately  integrated  into  the  paper.  Since 
the  intellectual  property  implications  of  this  process  are  not  yet  fully 
understood,  only  public  domain  documents,  or  documents  where 
appropriate  royalty  has  been  paid,  can  be  used. 
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Implications  for  the  Library,  the  Publisher,  and  the  User 

What  are  some  of  the  implications  of  such  a  scenario?  First,  the 
student  or  professor  does  not  have  to  set  foot  into  the  library  to  get 
relevant  citations  and  does  not  even  have  to  go  to  the  library  to  obtain 
the  needed  documents.  The  current  model  involves  the  removal  of  the 
paper  document  from  the  shelf  and  the  electronic  scanning  of  the 
materials,  a  labor-intensive  manual  process.  Ultimately,  the  library  will 
subscribe  to  a  journal  subscription  that  does  not  exist  on  paper,  but 
rather  on  a  master  file  server  at  the  publisher.  When  this  happens,  the 
documents  need  not  be  scanned  to  be  sent  to  the  student's  workstation, 
because  they  already  exist  in  that  form  on  the  server.  The  library  will 
then  function  as  a  "switch,"  routing  the  student's  document  request 
to  the  appropriate  server  where  the  library  has  a  subscription.  In  this 
scenario,  students  search  online  databases  themselves  and  own  the  latest 
CD-ROM  databases  needed  for  their  research.  In  fact,  the  same  CD- 
ROM  player  used  for  bibliographic  research  can  double  as  a  music  CD 
player  that  plugs  into  a  stereo  set! 
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What,  then,  is  the  function  of  the  library,  other  than  to  function 
as  a  museum  for  books?  The  logical  answer  is  that  the  library  will 
still  have  to  provide  the  reference  function  of  directing  researchers  to 
sources,  and  this  function  will  become  increasingly  sophisticated.  But 
the  most  important  function  of  the  library  will  be  education  and 
training.  The  technology  will  require  effective  training  and  the  library 
will  be  the  logical  place  for  this  training  function.  In  addition,  the 
library  will  increasingly  take  over  many  of  the  functions  now  in  the 
hands  of  the  computing  centers,  i.e.,  the  maintaining  of  the 
communication  and  computer  equipment  and  software.  The  library 
is  already  the  largest  database  in  most  universities  and  research 
institutions,  and  that  will  continue  even  with  the  new  technologies. 

The  new  technologies  will  have  profound  effects  on  publishers  as 
well.  They  will  no  longer  have  to  cut  down  trees  to  produce  paper 
copies  of  books  and  journals.  Mailing  costs  will  be  reduced,  and  virtually 
all  materials  handling  problems  will  disappear.  Problems  of  preservation 
will  become  moot  as  well,  since  digital  information  is  infinitely 
replicable  without  image  degradation.  Of  course,  the  problems  of 
"information  overload"  will  be  worse  than  ever.  The  amount  of 
information  accessible  to  any  scholar  will  be  many  times  what  it  is 
now,  and  he  or  she  will  still  have  to  sort  it  out  and  sift  out  all  unwanted 
materials.  The  technology  for  the  management  of  information  in  its 
physical  form  is  vastly  outstripping  current  ability  to  retrieve  important 
information  from  the  vast  quantity  of  material  in  the  universe  of 
information. 

Since  the  technology  described  above  allows  text  to  be  converted 
from  paper  to  electronic  form  and  then  transmitted,  there  are  severe 
problems  regarding  the  question  of  ownership  and  control  of  the 
information.  Non-copyrighted  material  in  the  public  domain  is  no 
problem,  but  proprietary  information  cannot  be  used  without  the 
permission  of  the  owner  of  the  copyright.  Getting  the  permissions  may 
prove  to  be  more  difficult  than  getting  the  documents. 
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